1. Introduction
Online social networks have become integral components of contemporary digital society, facilitating communication, collaboration, and information exchange across diverse populations [
1]. Platforms such as Facebook, Twitter, and Instagram generate vast volumes of interaction data, offering opportunities to study complex behaviours, group dynamics, and emergent societal phenomena [
2]. Extracting meaningful insights from such heterogeneous and dynamic datasets requires computational approaches capable of modelling latent structures, behavioural trends, and relational dynamics [
3].
Pattern detection in social networks is central to identifying communities, influence propagation, trust links, and information diffusion pathways [
3]. These insights support applications such as recommendation systems, misinformation detection, and public health surveillance. However, traditional algorithms often struggle with the scale, noise, and multimodal nature of online data [
4], and the temporal evolution of user behaviour demands adaptive and scalable solutions.
Recent advances in graph theory, machine learning, and natural language processing have opened new avenues for analysing large-scale social data [
5,
6]. Graph Neural Networks (GNNs), attention-based models, and temporal embeddings capture evolving topologies and semantic heterogeneity, while hybrid approaches such as trust propagation models [
7], matrix factorisation [
8], and neural frameworks [
9] further illustrate the utility of machine learning in complex network settings.
Despite these advances, a key gap remains in modelling higher-order constructs such as social attachment and digital trust. While sociological and psychological studies underscore the role of trust in enabling cooperation and community stability [
10,
11], their insights have yet to be fully operationalised in computational frameworks. For example, Donath [
12] highlighted the importance of online identity cues in shaping trustworthiness, but algorithmic implementations remain limited. Moreover, the relationship between affective signals—such as emotional tone, intimacy, and communication frequency—and bond strength has been underexplored. Prior studies also indicate that trust and privacy concerns are linked to risk perception and communication style [
13], offering behavioural dimensions that remain underutilised computationally [
14].
This paper addresses this gap by introducing a data-driven framework for modelling social attachment using behavioural features extracted from Facebook activity. Building on prior research in trust dynamics [
15,
16], we examine the predictive power of machine learning models in assessing interpersonal connection strength. Specifically, we investigate how temporal interactions, emotional sentiment, and public communication contribute to perceived closeness, with applications in digital mental health monitoring and social support detection. Our methods combine the expressive power of neural networks with traditional classifiers, comparing their effectiveness in predicting attachment strength from both structured and unstructured interaction data.
Despite the extensive body of research on trust prediction and community detection, existing studies have rarely incorporated higher-order relational constructs such as social attachment, intimacy, and emotional valence into large-scale computational frameworks. This omission limits the ability of prior models to capture the nuanced psychological and affective dimensions of online relationships. Our work directly addresses this gap by introducing attachment-oriented scoring functions that integrate temporal, emotional, and interactional cues, and by demonstrating their utility in both predictive modelling and behavioural segmentation.
The contributions of this work are threefold. First, we propose a dual attachment scoring mechanism that integrates temporal recency, emotional valence, and intimacy features, providing a psychologically grounded measure of tie strength. Second, we design an evaluation pipeline that combines supervised classification with unsupervised clustering, showing how attachment scores enhance predictive accuracy while uncovering latent behavioural segments. Third, we incorporate emotional and temporal signals as first-class features, bridging affective computing with social network analysis. Together, these innovations establish a robust framework for inferring mental state indicators from online behaviour, offering both theoretical insights and practical pathways for digital mental health applications.
The remainder of this paper is organised as follows.
Section 2 reviews the existing literature on trust prediction, social attachment, and behavioural analysis.
Section 3 introduces the methodological framework, integrating graph-based, machine learning, and deep learning approaches.
Section 4 details the implementation environment, feature extraction processes, attachment scoring functions, and model configurations.
Section 5 presents the experimental evaluation, followed by discussion. Finally,
Section 6 summarises the contributions, highlights applications, and outlines future research directions.
2. Related Work
Research on trust prediction in online social networks has evolved significantly over the past two decades, driven by the proliferation of user-generated content and complex interaction patterns. Early efforts drew on sociological and psychological theories of interpersonal trust [
10,
11], which laid the foundation for computational approaches. As online platforms gained prominence, trust began to be quantified through behavioural proxies such as interaction frequency, reciprocity, and endorsement patterns [
12,
17]. These insights motivated diverse modelling strategies, including graph-based inference [
18], probabilistic methods [
19], and early machine learning classifiers [
20,
21].
Several studies leveraged machine learning to incorporate structured and unstructured data into trust prediction. For instance, content-based and review-driven features were employed in [
22,
23,
24], while ref. [
8,
25] integrated temporal patterns and reputation scores. Neural methods have further expanded these capabilities, with ref. [
26] combining Dempster–Shafer theory with neural networks and ref. [
9] applying artificial neural networks for predictive trust inference. Attention-based mechanisms have been used in context-aware settings [
27], enabling adaptation to dynamic behaviours and shifting interaction patterns.
Graph structures remain central to trust modelling, as they capture relational dependencies and propagation dynamics. Notable examples include TrustWalker [
8] and CommTrust [
28], which leverage both user–item and trust graphs. Other approaches such as that in [
29] estimated trust via propagation and similarity measures, while ref. [
16,
30] refined these models with contextual cues and adaptive weighting. Social influence and homophily effects, examined in [
31,
32], have also been integrated into recommender systems and friend suggestion mechanisms.
Parallel to trust modelling, tie strength and attachment prediction have been studied through emotional and communicative features. For example, ref. [
33,
34] demonstrated how tie strength shapes user well-being and interaction intensity. Sentiment analysis, language use, and frequency of communication have been applied as proxies of closeness [
35,
36]. Privacy-aware frameworks [
13,
37] further stress ethical considerations in modelling sensitive interpersonal dynamics. Despite these efforts, the integration of affective signals and trust estimation remains limited.
More recent work combines statistical models with deep learning to address complex social behaviours. Hybrid frameworks in [
9,
38,
39] incorporate context-awareness, temporal evolution, and multimodal signals. Studies such as ref. [
4,
22,
40] demonstrate simultaneous mining of trust links and influence patterns, while ref. [
41] investigates trust evolution over time. In parallel, diffusion models such as the Independent Cascade [
42] and Linear Threshold [
43] have been foundational in simulating influence propagation.
Beyond computational models, longitudinal studies emphasise that shifts in engagement reflect deeper psychological transitions [
44]. The concept of multiplexity, highlighting overlapping relational contexts, was introduced in [
45], further informing tie strength and trust diffusion. Several surveys, including ref. [
1,
16], have summarised advances in trust modelling while noting challenges such as sparsity, cold-start effects, and the complexity–interpretability trade-off. Deep learning architectures, including GNNs and transformers, have been proposed to address these limitations [
46].
Recent work has increasingly explored multimodal approaches for social network analysis and mental health detection. One line of research has systematically reviewed multimodal sensing methods for mental health assessment, highlighting how integrating heterogeneous data sources improves detection accuracy [
47]. Complementary studies have emphasised the role of multimodal information, such as combining text, behavioural traces, and physiological signals, in screening for depression and related disorders [
48]. Beyond text and activity data, voice-based models have also been shown to provide valuable cues for depression recognition, particularly when pre-training strategies are applied [
49]. In parallel, reviews of AI applications on social media have demonstrated the potential of machine learning for analysing mental health conditions at scale, while also stressing ethical and interpretive challenges [
50].
Despite progress, many open challenges remain in modelling fine-grained constructs such as social attachment, where emotional, linguistic, and temporal cues must be jointly considered. Existing methods often capture either structural trust or affective signals in isolation. Our study contributes to bridging this gap by fusing trust prediction with attachment estimation on real-world Facebook data, using a diverse set of behavioural features. By building on prior work, we extend trust research into the domain of emotional closeness and social support detection, offering a machine learning framework for inferring mental state indicators in online networks.
3. Methodological Framework for Pattern Detection in Online Social Networks
The detection of meaningful patterns in online social networks presents a multifaceted challenge, requiring a systematic methodological approach that integrates data acquisition, representation, algorithm selection, and feature modelling. As digital interactions generate increasingly complex and large-scale datasets, the need for robust and scalable analytical frameworks has intensified. This section outlines the methodological foundation adopted in this study for detecting behavioural and structural patterns in social network data. Emphasis is placed on the integration of traditional graph-based analysis with modern machine learning and deep learning techniques, aiming to model phenomena such as trust propagation, attachment strength, and community structure. The proposed framework encompasses all stages of the computational pipeline—from ethical data collection and graph construction to feature extraction, algorithmic learning, and visual interpretation—thereby enabling a comprehensive and reproducible approach to social network analysis.
3.5. Algorithmic Strategies for Pattern Detection
Once the social network is represented as a graph, the selection of appropriate algorithms becomes a critical step in uncovering meaningful patterns. These algorithmic strategies must align with the specific analytical goals—such as detecting communities, identifying influential users, or uncovering anomalies—as well as with the properties of the underlying data, including network density, node attributes, and temporal dynamics [
28,
62].
Pattern detection tasks in online social networks are commonly approached using both supervised and unsupervised learning techniques. Supervised models—such as logistic regression, support vector machines, and ensemble classifiers—are well suited to tasks like trust prediction or attachment classification where labelled data are available [
8,
9]. In contrast, unsupervised methods—including clustering, dimensionality reduction, and graph partitioning—are effective for exploratory analysis when prior knowledge is limited or when latent structures must be inferred directly from raw network topologies [
63,
64].
Community detection remains one of the most widely studied problems in network science. Algorithms such as the Louvain Method, Label Propagation, and DBSCAN reveal cohesive subgroups by exploiting modularity, density, or iterative label convergence [
65,
66]. These techniques are particularly valuable for identifying latent social structures, affiliation groups, or clusters of users that exhibit similar interaction behaviours. In our framework, community detection complements attachment estimation by highlighting user clusters with shared intimacy cues or communication frequency.
Identifying influential or central users is another crucial aspect of trust and attachment modelling. Centrality-based measures—including degree, betweenness, and eigenvector centrality—provide baseline indicators of influence and information diffusion. These measures can be extended with learning-based approaches that incorporate node attributes, edge weights, and temporal frequency of interactions, thereby offering more nuanced estimates of user impact on trust propagation and social bonding [
15,
41].
In dynamic environments, temporal pattern detection algorithms are essential for capturing evolving behaviours. Approaches such as time-aware clustering, sliding-window analysis, and recurrent models (e.g., LSTMs) allow researchers to track behavioural drift, sentiment shifts, and the rise in or decay of trust and attachment over time [
34,
67]. Such methods are particularly important for monitoring online signals of emotional well-being or relationship closeness, which can fluctuate rapidly in response to social or contextual changes.
In this context, we use the term temporal pattern detection algorithms to refer to computational methods that explicitly capture sequential or time-dependent changes in user behaviour and network structure. Examples include sliding-window analysis, time-aware clustering, and recurrent neural networks such as LSTMs. By focusing on temporal dependencies, these algorithms ensure that shifts in engagement, emotional tone, or relational closeness are incorporated into the analysis rather than treated as static phenomena.
Overall, the choice of algorithmic strategy directly influences the interpretability and robustness of results. By combining statistical methods, graph-based techniques, and machine learning models, it becomes possible to detect patterns that are not only accurate but also actionable for inferring social attachment and mental state indicators in online networks.
4. Implementation and Experimental Setup
This section presents the complete technical framework designed to model social attachment, mental state indicators, and trust dynamics from Facebook user activity. The system integrates natural language processing (NLP), machine learning (ML), graph-based analysis, and time-series modelling to derive insights from users’ behavioural, emotional, and relational patterns. The pipeline comprises five core stages: data preprocessing, behavioural feature extraction, scoring function computation, model-based classification and clustering, and evaluation through multiple metrics. Emphasis was placed on interpretability, robustness, and reproducibility.
5. Experimental Evaluation
This section presents a comprehensive evaluation of the proposed framework for modelling social attachment and inferring mental state indicators from Facebook activity. The evaluation consists of three components: (i) computation of attachment strength between users based on behavioural and emotional metrics, (ii) classification of users into mental state categories using a wide range of machine learning algorithms, and (iii) unsupervised clustering to reveal latent behavioural groupings. A final discussion interprets the findings in relation to social and psychological theory, compares algorithmic approaches, and reflects on the broader implications. The aim is to provide both quantitative results and qualitative insights into how social media activity reflects mental health patterns.
To ensure that the evaluation of attachment-based features was not biased toward a single model family, we systematically benchmarked a diverse set of machine learning algorithms. This included traditional statistical learners (e.g., Logistic Regression, Naïve Bayes), tree-based ensembles (e.g., Random Forest, XGBoost, LightGBM), and modern neural architectures (e.g., LSTM, BERT). The motivation for this breadth is twofold: (i) to assess the generalizability of the proposed attachment scoring functions across fundamentally different algorithmic paradigms, and (ii) to identify which model families are most sensitive to temporal and emotional cues. Such comprehensive evaluation highlights the robustness of our formulation while providing a fair comparison between interpretable models and deep learning methods.
5.1. Attachment Strength Calculation
Attachment strength was operationalised through two alternative formulations that combine behavioural, emotional, and temporal features. The first approach adopts a normalised scoring function, which rescales interaction variables into the interval and aggregates them into a bounded index of relational closeness. This formulation provides a compact representation of tie strength but may limit sensitivity to extreme behaviours.
Table 2 reports the attachment values computed using this normalised model. All scores are constrained between 0 and 1, reflecting relative proportions of engagement across the available interaction features.
The second approach employs a weighted linear combination, in which features retain their raw scale and are assigned empirically derived coefficients. Unlike the normalised variant, this formulation yields unbounded scores—including negative values—and applies stronger penalisation for prolonged inactivity. The intent is to capture a broader dynamic range of tie strength, particularly for distinguishing weak or dormant relationships.
Table 3 presents the attachment values calculated using this weighted function, illustrating the expanded score distribution and increased sensitivity to recency and emotional cues.
The normalised scores in
Table 2 indicate that even with minimal interaction, attachment strength remains above a relatively high baseline. For example, ID10—with no recent messages, wall posts, or comments—still received a score of 0.17. Similarly, ID5, despite having no recent communication but a high number of historical comments, achieved a score of 0.66. These results suggest that the normalised formula tends to assign generous baseline values whenever any form of interaction is present, even if outdated or one-dimensional. This inflates weaker ties and reduces the model’s ability to clearly distinguish passive from active relationships.
In contrast, the weighted formulation in
Table 3 produces a much wider range of values, from
(ID19) to
(ID11). For instance, ID19 and ID15 both exhibited over 900 days of inactivity, yet differences in intimacy words and emotional content led to variations in their scores by about 15 units. ID18, with only 189 days since last communication and strong intimacy/emotion signals, obtained the least negative score (
). These cases illustrate the weighted model’s heightened sensitivity to both recency and emotional richness.
This numerical contrast is substantial: the normalised formula compresses users into a narrow band of (range ∼0.82), while the weighted formulation spans a much broader interval of (range ∼655). Such a dynamic range provides finer resolution in differentiating tie strengths, which in turn enhances the effectiveness of downstream tasks such as classification and clustering. Models can better exploit the distributional richness of the weighted scores, achieving clearer separation between strong and weak ties.
Overall, the experiments suggest that while both scoring approaches align with observable user behaviour, the linear-weighted function more effectively captures tie strength variation. Its ability to penalise inactivity and amplify emotional signals makes it particularly suitable for modelling subtle psychological traits and social attachment patterns in Facebook interactions.
5.2. Classification Performance Analysis
To evaluate the predictive power of attachment strength as a feature for modelling users’ mental state indicators, we tested a broad set of machine learning models under two scoring schemes: the normalised and the weighted formulations described earlier. The task involved predicting mental well-being labels derived from behavioural and emotional Facebook activity. Models were evaluated with stratified 10-fold cross-validation using six metrics: accuracy, precision, recall, F1-score, AUC-ROC, and PR-AUC. The inclusion of AUC-ROC and PR-AUC is particularly important in the presence of class imbalance, as they provide more robust measures of discriminative ability and precision–recall trade-offs.
The class labels for mental state categories were approximately balanced (positive 54%, at-risk 46%), and stratified cross-validation preserved this distribution. Given the minor imbalance, no resampling techniques (e.g., SMOTE) were required, although class weighting was applied where relevant.
Table 4 reports the results obtained with normalised attachment scores, which constrain outputs to the interval
and represent a bounded probabilistic measure of tie strength.
Table 5 shows the results with the weighted formulation, which yields real-valued scores (including negatives) and places stronger emphasis on recency, intimacy, and emotional tone.
Across both experiments, BERT consistently delivered the strongest results, achieving an accuracy of 0.95 with normalised attachment scores and 0.96 with weighted scores, alongside an AUC-ROC of up to 0.98 and PR-AUC of 0.97. Its ability to capture contextual and semantic nuances in user-generated content likely explains this performance edge. Ensemble-based learners such as XGBoost and LightGBM also performed robustly, with both models exceeding 0.93 in accuracy and F1-score, while reaching PR-AUC values above 0.95 under the weighted formulation. These findings suggest that advanced models which exploit non-linear feature interactions and contextual embeddings are particularly effective in leveraging attachment-based signals.
The introduction of the weighted scoring function contributed positively to classification discriminability. Compared with the normalised formulation, the weighted version improved recall, F1-score, AUC-ROC, and PR-AUC across most algorithms, reflecting its enhanced ability to separate weak and strong ties through finer granularity. By penalising prolonged inactivity more strongly and amplifying emotionally significant interactions, the weighted formulation provided a richer feature representation that facilitated better generalisation across classifiers. The improved PR-AUC in particular demonstrates its strength in imbalanced settings, where correctly identifying minority cases (e.g., vulnerable users) is essential.
Traditional models such as Logistic Regression and Naive Bayes demonstrated limitations despite occasionally achieving high precision. Their recall values and PR-AUC scores remained low, indicating frequent failure to identify weaker attachment categories. While their performance improved modestly with weighted scores, the gains were not comparable to those observed in more sophisticated learners. In contrast, models sensitive to interaction effects—such as Gradient Boosting and LSTM—benefited noticeably from the revised scoring dynamics, showing improved balance across all six metrics. These improvements confirm that the revised formulation provides features that align better with the inductive biases of non-linear learners.
Taken together, these results confirm that the design of the attachment strength function plays a critical role in downstream prediction tasks. By integrating recency, intimacy, and emotional tone into a weighted formulation, the revised model aligns more closely with ground-truth user states and enhances predictive accuracy. Moreover, the inclusion of AUC-ROC and PR-AUC reveals the robustness of these improvements, particularly under class imbalance. This demonstrates the value of psychologically informed, data-driven indicators when modelling affective states and social attachment from online behaviour.
To confirm that performance improvements were statistically reliable, we conducted paired
t-tests across cross-validation folds, comparing the top-performing models (BERT, XGBoost, LightGBM) against a baseline classifier (Logistic Regression).
Table 6 reports the
p-values, showing that the advanced models achieved significantly higher F1-scores at the
level.
Feature Relevance Analysis
To provide a data-driven justification for the selected behavioural and emotional characteristics, we conducted a feature importance study using Random Forest and Gradient Boosting models, complemented by Pearson correlation analysis with attachment strength scores.
Table 7 presents the relative contributions of each feature, averaged across the two ensemble methods. The analysis confirms that intimacy-related words and sentiment polarity are the strongest predictors, followed by recency of interaction and message frequency. Wall posts and comments contributed moderately, while emojis and punctuation showed negligible importance and were excluded from the final feature set. These findings validate that the chosen features not only align with theoretical constructs of social attachment but also maximise predictive utility.
5.3. Unsupervised Clustering of Behavioural Patterns
To uncover latent behavioural groupings in the user population, we applied unsupervised clustering to the Facebook activity dataset. Four widely used algorithms were evaluated—K-Means, Gaussian Mixture Model (GMM), Agglomerative Clustering, and DBSCAN—over normalised behavioural features (message frequency, words of intimacy, sentiment score, and days since last interaction). The objective was to identify coherent clusters that reflect distinct patterns of social engagement and emotional expression.
Table 8 reports cluster assignments for a representative subset of 20 users across the four algorithms. Each user is described by four core features (messages, intimacy word frequency, sentiment, and recency), chosen to capture communication intensity, emotional valence, and temporal dynamics—key signals that can relate to underlying mental states.
As shown in
Table 8, K-Means, GMM, and Agglomerative Clustering consistently identified three distinct behavioural groups. These clusters span a spectrum of social engagement—from high-affinity users characterised by frequent, emotionally expressive communication (e.g., users 3, 10, and 17) to disengaged individuals with prolonged inactivity or minimal sentiment (e.g., users 4, 6, and 20). K-Means and GMM showed particularly strong alignment, assigning nearly identical labels to users with similar profiles.
Agglomerative Clustering also produced coherent groups but diverged on borderline cases—such as users 8 and 10—where linkage criteria likely influenced boundary placement. Unlike the centroid-based separation of K-Means or the probabilistic flexibility of the GMM, the hierarchical merging process can fragment tightly knit groups under certain feature configurations. Nonetheless, the broad agreement across these three methods supports the reliability of the observed segments.
By contrast, DBSCAN failed to identify meaningful clusters in this subset. All users received the label -1 (noise), indicating that under the specified hyperparameters , the data lacked sufficient density to form core points. This behaviour is consistent with high-dimensional sparsity and overlapping user patterns that are not well captured by global distance thresholds. With absent domain-specific tuning of density parameters, DBSCAN appears unsuitable for this behavioural dataset.
To ensure that DBSCAN’s poor performance was not simply due to arbitrary parameter choices, we performed a systematic grid search over a wide range of (0.1–5.0) and min_samples (3–20) values. Across all tested configurations, DBSCAN consistently failed to identify stable or meaningful clusters, with the vast majority of points labelled as noise. This suggests that the algorithm’s density-based assumptions are poorly matched to the sparsity and overlap inherent in Facebook interaction data, rather than being a consequence of suboptimal hyperparameter selection.
To assess internal cluster quality, we computed silhouette scores for each algorithm (
Table 9). The silhouette measures how well each point fits within its assigned cluster relative to others; values closer to 1 indicate compact, well-separated clusters.
K-Means achieved the highest silhouette score (0.184), indicating relatively compact clusters and clear separation. The GMM followed closely (0.180), suggesting its probabilistic modelling was similarly effective in capturing behavioural structure. Agglomerative Clustering obtained a slightly lower score (0.174), consistent with less compact groupings when hierarchical linkage is applied to high-dimensional features. In stark contrast, DBSCAN scored 0.051, confirming weak structure under the tested parameters.
It is important to note that the silhouette scores obtained in our experiments (all ) indicate relatively weak cluster separation. This limitation likely stems from the high-dimensional and overlapping nature of Facebook interaction features, where behavioural signals (e.g., sentiment, intimacy, activity frequency) may not form sharply delineated groups. While the absolute values are low, the comparative differences between algorithms remain informative: K-Means and GMM consistently produced more cohesive clusters compared to Agglomerative Clustering and DBSCAN. Thus, although cluster compactness is limited, relative performance rankings still provide meaningful evidence about which algorithms are better suited to this type of social interaction data.
Overall, the numerical evidence highlights that K-Means and GMM provide the most reliable clustering results for this dataset, while Agglomerative Clustering offers moderate performance and DBSCAN is unsuitable under the tested conditions.
We also evaluated cross-method consistency using pairwise Normalised Mutual Information (NMI) (
Table 10). NMI quantifies agreement between two clusterings (0–1), independent of label permutations.
The highest agreement was observed between K-Means and GMM (NMI = 0.86), indicating that both recover highly similar structures. GMM also aligned well with Agglomerative Clustering (0.76), while K-Means and Agglomerative showed moderate consistency (0.72). DBSCAN exhibited very low agreement with all other methods (NMI 0.09–0.12), reflecting its divergent behaviour and instability on this dataset.
The numerical evidence confirms that centroid-based (K-Means) and probabilistic (GMM) models converge on highly consistent cluster structures, whereas DBSCAN diverges strongly, reinforcing its unsuitability for this type of behavioural data.
Finally,
Table 11 summarizes cluster-size distributions for the full dataset of 2500 users across all algorithms.
K-Means and GMM produced relatively balanced partitions, indicating uniform segmentation of user behaviour. Agglomerative Clustering yielded a more uneven distribution, with Cluster 1 covering 59.2% of users, suggesting possible over-merging under its linkage criterion. DBSCAN identified only a single valid cluster and flagged 2.28% of users as noise, corroborating its poor fit for the present feature space and parameter settings.
For visual comparison,
Figure 2 provides two-dimensional projections of the clustering outcomes for each algorithm.
Figure 2 is consistent with the quantitative metrics. K-Means and GMM produce compact, well-separated structures, supporting their suitability for behavioural partitioning in this context. Agglomerative Clustering shows more elongated and overlapping formations, reflecting weaker separation despite identifying three functional groups. DBSCAN again fails to reveal meaningful structure: most points collapse into a single blob or are marked as noise, underscoring its sensitivity to sparsity and parameterisation on high-dimensional Facebook activity profiles.
5.4. Interpretive Discussion and Insights
The results across all three experimental components provide a coherent and multifaceted perspective on the computational modelling of social attachment and mental state indicators from Facebook activity. A central finding is that the two attachment strength formulations produced markedly different distributions. The normalised scoring function constrained values to the interval , but often failed to distinguish between passive users and moderately active ones, thereby compressing tie-strength variability. By contrast, the weighted formulation introduced an unbounded scale that more sharply penalised prolonged inactivity and amplified the impact of emotional signals, yielding a richer and more discriminative representation. This divergence was particularly evident for users with low engagement who still received relatively high scores under the normalised model, potentially masking differences in their relational significance.
These scoring differences had a direct effect on supervised classification. Both formulations enabled accurate prediction of mental state indicators, yet models trained on weighted attachment scores consistently outperformed those using normalised scores. The strongest gains were observed for advanced learners such as BERT, XGBoost, and LightGBM, which achieved superior accuracy and F1-scores. Importantly, recall also improved under the weighted formulation, highlighting its sensitivity to subtle behavioural variations that may reflect early warning signs of emotional distress, withdrawal, or stress. This is particularly relevant in digital mental health contexts, where false negatives—i.e., failing to detect at-risk users—can have serious implications.
In contrast, traditional learners such as Logistic Regression and Naive Bayes, while computationally efficient, underperformed in recall and F1-score, suggesting that they lacked the capacity to fully capture the non-linear interplay of emotional and temporal features. This comparison highlights the importance of selecting model families aligned with the complexity of the underlying constructs being modelled.
Unsupervised clustering further validated the attachment framework. K-Means and GMM consistently produced coherent and interpretable behavioural groupings, supported by higher silhouette scores and strong Normalised Mutual Information (NMI) agreement. These clusters aligned with latent dimensions of social interaction ranging from consistent, emotionally rich engagement to sporadic or minimal communication, reflecting well-established psychological typologies. Agglomerative Clustering, while less stable, still identified meaningful groupings, though with uneven cluster sizes. By contrast, DBSCAN consistently failed to identify actionable partitions, largely due to the sparsity and overlap of high-dimensional features combined with sensitivity to density parameters. This underperformance highlights the limitations of density-based approaches when applied to behavioural social media data.
The contrast between centroid-based (K-Means), probabilistic (GMM), and hierarchical (Agglomerative) approaches also demonstrates how different algorithmic assumptions shape the resulting behavioural segments: centroid and probabilistic models converged on highly similar structures, while hierarchical clustering tended to over-merge groups. This reinforces the importance of method selection when the goal is to capture nuanced, fine-grained patterns of attachment.
Taken together, these results demonstrate that attachment strength—when modelled with emotional, behavioural, and temporal nuance—serves as a robust organising variable for downstream analysis. Both supervised and unsupervised evaluations converged on consistent behavioural structures, reinforcing theoretical assumptions from attachment theory and affective computing. The consistency across fundamentally different methodological paradigms (classification vs. clustering) also strengthens the external validity of the framework, suggesting that attachment-based features generalise well across analytical settings.
The convergence of results across model families with very different inductive biases further supports the external validity of the attachment features: whether interpreted through statistical learners, boosting ensembles, or neural architectures, the same underlying behavioural constructs emerged as discriminative and stable.
Ultimately, the study shows that mental state indicators can be inferred from online behavioural data with both high accuracy and interpretability, provided that features are grounded in psychologically meaningful constructs. The integration of emotional valence, intimacy markers, and interaction recency emerges as crucial for achieving both granularity and theoretical relevance. Beyond methodological advances, these findings carry practical implications: they can inform the design of mental health monitoring tools, digital intervention systems, and ethically responsible social computing applications. Understanding the nuances of online connectedness is foundational not only for improving predictive performance but also for ensuring that algorithmic insights support user well-being in a transparent and socially responsible manner.
5.5. Explaining Algorithmic Performance Differences
The comparative evaluation revealed systematic differences in how various algorithms handled the attachment-based features. These differences can be explained by the inductive biases and representational capacities of each model family.
Transformer-based models (BERT). BERT consistently achieved the highest performance across all tasks. This can be attributed to its ability to capture nuanced emotional and psychological signals from text, leveraging contextual embeddings that go beyond surface-level word counts or sentiment scores. Its attention mechanism allows the model to focus on subtle linguistic cues that strongly correlate with attachment and mental state indicators.
Ensemble learners (XGBoost, LightGBM, CatBoost). Gradient-boosting ensembles performed nearly as well as BERT, benefiting from their ability to model complex non-linear feature interactions and handle heterogeneous feature types. These models also provided robustness against noise, which is common in user-generated social data.
Traditional classifiers (Logistic Regression, Naïve Bayes, SVM). While interpretable, these models struggled with recall, often failing to detect weaker or borderline cases of attachment. Their reliance on linear boundaries or simplified probabilistic assumptions limited their ability to capture the complex interplay between temporal, emotional, and behavioural features.
Neural architectures (ANN, LSTM). Feedforward and recurrent models offered moderate gains over traditional baselines by capturing temporal dynamics in user interactions. However, without the deep contextual embeddings available to transformers, they underperformed compared to BERT. LSTM models were particularly useful for modelling sequential activity patterns but were less effective with sparse data.
Clustering algorithms. K-Means and GMM achieved consistent and interpretable groupings because their inductive assumptions (centroid proximity and probabilistic mixture modelling) aligned well with the feature space shaped by attachment scores. Agglomerative Clustering, though meaningful, produced imbalanced groups due to sensitivity to linkage choices. DBSCAN underperformed because behavioural features lacked the density structure required for its neighborhood-based approach, especially in high-dimensional sparse settings.
Overall, these findings highlight that models capable of leveraging non-linear dependencies and rich contextual information (e.g., BERT, gradient-boosting ensembles) are most effective for attachment-based behavioural prediction. In contrast, simpler classifiers offer interpretability but at the cost of sensitivity, while density-based clustering methods are unsuitable for this data domain.
6. Conclusions and Future Work
This paper presented a comprehensive computational framework for modelling social attachment and inferring mental state indicators from Facebook activity using machine learning techniques. Drawing upon principles from social psychology, affective computing, and behavioural analytics, we proposed a dual scoring mechanism for quantifying interpersonal attachment strength by integrating temporal recency, emotional tone, and communication features. These scores were subsequently employed in both supervised classification and unsupervised clustering tasks, enabling the identification of latent behavioural segments and user well-being profiles.
Experimental evaluation confirmed the value of the proposed approach: the weighted attachment strength formulation offered more granular and discriminative representations than the normalised variant, and advanced classifiers such as BERT and gradient-boosting models consistently achieved strong predictive performance. Unsupervised clustering with K-Means and GMM further revealed coherent user groupings, validating the structural soundness of the attachment model. Rather than focusing on specific performance metrics already detailed earlier, these results underscore that integrating temporal and emotional signals is essential for modelling nuanced aspects of social connectedness and psychological state.
Beyond empirical validation, the framework contributes to interdisciplinary understanding by operationalising abstract constructs—such as intimacy, trust, and attachment—into quantifiable indicators. The interpretive discussion linked algorithmic outcomes to theoretical assumptions, demonstrating how observable online behaviours can reflect broader mental health trajectories. These findings support the potential of data-driven systems to augment early-warning tools and targeted interventions in digital mental health contexts.
Future work will extend this research along several directions. First, the scoring functions can be expanded to incorporate richer multimodal data, including images, reactions, and user metadata, thereby enhancing attachment modelling fidelity. Second, temporal dynamics will be modelled using recurrent or attention-based architectures to better capture behavioural evolution and shifts over time. Third, ethical considerations—encompassing consent, transparency, and algorithmic fairness—will be addressed more explicitly to ensure that affect-aware technologies are deployed responsibly in sensitive contexts.
This study is not without limitations. All experiments were conducted on a single Facebook-derived dataset, which—while ensuring internal consistency—restricts external validation. Cultural and demographic biases may also be present, as the sample is not globally representative. Furthermore, the reliability of self-reported mental state labels remains an inherent challenge, as such annotations are subject to subjectivity and potential inconsistency. Finally, feature extraction was constrained by the platform’s available signals, which may not fully capture offline attachment or psychological states. Future research should address these issues by validating the framework across multiple platforms (e.g., Twitter, Reddit, LinkedIn), recruiting more culturally diverse samples, and incorporating richer multimodal ground-truthing.
Beyond research implications, practical deployment pathways should also be explored. The framework could be integrated into digital well-being platforms, social media monitoring systems, or early-warning tools for clinicians. Deployment would require scalable APIs for real-time feature extraction, efficient infrastructure for attachment score computation, and interpretable interfaces that allow practitioners to act upon system outputs. Collaboration with mental health professionals and cross-platform testing will be key to ensuring both practical relevance and responsible adoption.
In conclusion, this study provides a scalable and interpretable pipeline for extracting meaningful psychological and relational insights from Facebook user activity. By combining data-driven learning with theoretical grounding, the proposed approach establishes a foundation for advancing both computational social science and digital mental health analytics.