Multilayer Network Modeling for Brand Knowledge Discovery: Integrating TF-IDF and TextRank in Heterogeneous Semantic Space

Xu, Peng; Zang, Rixu; Wang, Zongshui; Sun, Zhuo

doi:10.3390/info16070614

Open AccessArticle

Multilayer Network Modeling for Brand Knowledge Discovery: Integrating TF-IDF and TextRank in Heterogeneous Semantic Space

by

Peng Xu

¹,

Rixu Zang

²,

Zongshui Wang

^3,*

and

Zhuo Sun

⁴

¹

School of Humanities, Jiaozuo University, Jiaozuo 454000, China

²

Agricultural Bank of China Limited Weifang Branch, Weifang 261000, China

³

Business School, Beijing Information Science and Technology University, Beijing 100192, China

⁴

School of Information Management, Zhengzhou University, Zhengzhou 450001, China

^*

Author to whom correspondence should be addressed.

Information 2025, 16(7), 614; https://doi.org/10.3390/info16070614

Submission received: 26 May 2025 / Revised: 13 July 2025 / Accepted: 15 July 2025 / Published: 17 July 2025

Download

Browse Figures

Versions Notes

Abstract

In the era of homogenized competition, brand knowledge has become a critical factor that influences consumer purchasing decisions. However, traditional single-layer network models fail to capture the multi-dimensional semantic relationships embedded in brand-related textual data. To address this gap, this study proposes a BKMN framework integrating TF-IDF and TextRank algorithms for comprehensive brand knowledge discovery. By analyzing 19,875 consumer reviews of a mobile phone brand from JD website, we constructed a tri-layer network comprising TF-IDF-derived keywords, TextRank-derived keywords, and their overlapping nodes. The model incorporates co-occurrence matrices and centrality metrics (degree, closeness, betweenness, eigenvector) to identify semantic hubs and interlayer associations. The results reveal that consumers prioritize attributes such as “camera performance”, “operational speed”, “screen quality”, and “battery life”. Notably, the overlap layer exhibits the highest node centrality, indicating convergent consumer focus across algorithms. The network demonstrates small-world characteristics (average path length = 1.627) with strong clustering (average clustering coefficient = 0.848), reflecting cohesive consumer discourse around key features. Meanwhile, this study proposes the Mul-LSTM model for sentiment analysis of reviews, achieving a 93% sentiment classification accuracy, revealing that consumers have a higher proportion of positive attitudes towards the brand’s cell phones, which provides a quantitative basis for enterprises to understand users’ emotional tendencies and optimize brand word-of-mouth management. This research advances brand knowledge modeling by synergizing heterogeneous algorithms and multilayer network analysis. Its practical implications include enabling enterprises to pinpoint competitive differentiators and optimize marketing strategies. Future work could extend the framework to incorporate sentiment dynamics and cross-domain applications in smart home or cosmetic industries.

Keywords:

multilayer network modeling; brand analysis; knowledge discovery

1. Introduction

In the current highly competitive market environment, which is characterized by escalating product homogenization, brands serve as critical identifiers for enterprises, products, or geographical regions, with their significance being self-evident. As the core manifestation of brand characteristics, brand knowledge has emerged as a decisive factor influencing consumers’ purchasing decisions [1]. Research in this field contributes to the systematic investigation of brand knowledge, allowing researchers to conduct comprehensive analyses of product reviews under specific brand categories and synthesize consumer focus points. Such scholarly inquiry will enable enterprises to better comprehend the differentiated brand selection behaviors among consumers, thereby facilitating the development of targeted marketing strategies that enable us to overcome the challenges posed by homogenization-driven competitive stagnation.

While complex network research has yielded substantial achievements in biological, social, and engineering domains, its focus has been predominantly centered on single-layer networks, with multidimensional network studies remaining relatively scarce. On one hand, consumer decision-making processes are inherently influenced by the interplay of multiple factors; sole reliance on singular dimensions introduces methodological insufficiency [2]. Systematically investigating brand knowledge through multilayer network frameworks—transcending the homogeneity constraints of nodes and edges in single-layer networks—holds critical significance. On the other hand, the exponential growth of big data and continuous advancements in data information technology necessitate enhanced technical capabilities for the processing of massive datasets. To extract deeper insights and uncover latent associations within data, constructing multilayer networks for hierarchical data processing emerges as an essential analytical strategy [3]. In addition, the emotional tendency embedded in consumers’ reviews is an important complementary dimension in brand knowledge discovery research. Sentiment analysis refers to the process of identifying and analyzing the emotions expressed in textual data by means of natural language processing and machine learning techniques. Sentiment analysis of product reviews can uncover information such as consumers’ attitudes toward brands, brand strengths and weaknesses, and so on.

Accordingly, in this study, two keyword extraction methods, TFIDF and TextRank, combined with multilayer network features, are used to construct a brand knowledge multilayer network (BKMN) model for branded merchandise reviews; these reviews are characterized by a wide range of semantic connectivity in the form of expressions, and the keywords extracted using the centrality analyses are used to explore the intra- and interlayer correlations that are contained in the network layers and the central keywords of the reviews. In addition, we analyze the extracted keywords of commodity reviews by centrality, mine the intra- and interlayer correlations and the central keywords of the reviews contained in the network layers, reorder the keywords according to the centrality, and finally perform a sentiment analysis of the reviews; the aim here is to identify the topological characteristics of the network as a whole and analyze the review networks of different network layers according to the characteristics of the network. This will enable us to complete the task of extracting the central keywords of knowledge discovery; at the same time, we will be able to improve the sentiment analysis method, supporting the discovery of sentiment tendencies in reviews of branded goods.

The following are the main contributions of this study:

A brand knowledge discovery model based on a multilevel network is constructed: keywords are selected using the TFIDF algorithm and the TextRank algorithm; these keywords are used as two network levels of the multilevel network, and the duplicate nodes in the two algorithms are used as the intermediate network level, leading to a total of three network levels.
The analyses of the results found that consumers are most concerned about the photo function, running speed, sound effect, sound quality, appearance aesthetics, and battery life of cell phones.
Optimization of sentiment analysis methods: here, the Mul-LSTM model is proposed to evaluate the effect of sentiment binary classification. By adjusting the parameters, the model finally achieves a 93% binary classification effect.

2. Literature Review

2.1. Brand Knowledge Discovery

Knowledge discovery is the process of extracting valuable insights from large volumes of complex raw data while identifying valid, novel patterns that support informed decision-making and user actions [4]. The overall process typically comprises five stages: data selection, preprocessing, transformation, data mining, evaluation, and interpretation [5]. Brand knowledge refers to consumers’ perceptions and cognition of a brand, encompassing its history, image, values, market positioning, and promotional strategies [6]. Such brand-related knowledge is frequently derived from user-generated content, especially product reviews related to the brand. Accordingly, brand knowledge discovery can be defined as the process of analyzing brand cognition-related data through computational algorithms in order to extract and visualize relevant information, with product reviews serving as the primary data source. This process enables firms to gain insights into consumer attitudes and preferences toward a brand and subsequently adjust their marketing strategies. However, brand knowledge involves complex comment information, and few studies have investigated consumers’ perceptions of brands in depth through the method of building multilevel network models; therefore, this study is of significance in the literature concerning brand knowledge from the perspective of multilevel networks.

The current research on brand knowledge discovery based on the analysis of product reviews largely employs natural language processing and machine learning methods. Lu and other scholars used text mining technology to build a list of keywords representing brand personality, revealed the key factors affecting brand personality through sentiment analysis, categorized brand personality, and created brand perception charts based on the sentiment analysis to reveal the personality differences between different brands [7]; Joyce and other scholars used a plain Bayesian classifier to analyze the sentiment of brand reviews, and by reducing redundant vocabulary, increasing the weight of meaningful comments and other ways to improve the accuracy of the model, effectively predicting the positive and negative sentiment of the reviews, which can provide useful information for consumers and enterprises [8]. In addition, there are many methods that can be used in knowledge discovery research, including the following approaches: the LDA method assumes that documents are generated by a mixture of multiple topics and captures the implicit topic structure by inferring the topic distribution of words through a probabilistic model [9]; word embedding approaches map words to a low-dimensional vector space and capture the contextual semantic associations of the words by representing semantic similarity through vector distances [10]; neural modeling approaches process sequential sequences through the structure of neural network data and automatically extract semantic features—these approaches are suitable for sentiment analysis and text categorization [11]. Compared to these methods, BKMN can contribute to the structured analysis of brand knowledge through the multilayer network fusion of TF-IDF and TextRank, capturing both word frequency and semantic associations; additionally, the network topology of BKMN is more interpretable. The advantages of TF-IDF lie in statistical efficiency and high-frequency feature capture [12]; the advantages of TextRank lie in semantic association modeling and graph structure analysis [13]. Both of them retain the statistical features of brand reviews and capture semantic relationships through multilayer network fusion, providing methodological support for brand knowledge discovery with both efficiency and depth.

2.2. Multilayer Networks

Research on hierarchical networks originates from 1736, when Leonhard Euler introduced graph theory by abstracting real-world problems into vertices and edges. In 1959, random graph theory was proposed, leading to the development of the Erdős–Rényi (ER) model [14]. Subsequently, the small-world (WS) [15] and scale-free (BA) network models [16] were introduced in 1998 and 1999, respectively; these stimulated increasing academic interest in the study of complex networks. Given the intricate and dynamic nature of real-world systems, single-layer network approaches often fail to capture interdependencies among system components. To overcome this limitation, the concept of multilevel networks was introduced [17], facilitating the representation of heterogeneous node types and diverse inter-edge relationships. To precisely capture the multi-attribute properties of nodes and edges in such networks, scholars have proposed various theoretical frameworks. For instance, Zhang et al. modified the original PageRank algorithm to develop a smoothed, distributed Markov chain model for analyzing complex network relationships [18]. Moreover, Rahmede et al. proposed the MultiRank algorithm to calculate the weighted rankings of nodes and layers in large-scale multiplex networks [19]. Multilayer networks are employed to represent and analyze interrelated entities, where distinct entities and connections may give rise to variations in multilayer networks [20].

Currently, research on multilevel networks is mainly focused on biological networks, transportation networks, communication networks, and interpersonal social networks; less research has been performed on keyword knowledge of brand product reviews. For example, in social networks, the influence relationship between different levels is mainly revealed through multilevel network analysis [21]; in biological networks, the complexity of biological systems is shown through multilevel network modeling [22]. In the field of brand knowledge, most of the studies are single-level networks, but brand review data are mostly characterized by independent heterogeneity and sharing difficulty; these characteristics make them suitable for multilevel network analysis, supporting the accurate discovery of the center of consumers’ interest in a brand.

In addition, in terms of keyword extraction methods, to obtain the key information from text reviews—combined with the semantic complexity and colloquialism of branded goods reviews—it is more accurate to use TFIDF and TextRank keyword extraction algorithms to identify important words and phrases in the reviews; here, the extracted keywords are usually directly related to the topics of the reviews, which can provide valuable information for the formulation of summarization, classification, and search tasks. This means that valuable information can be obtained for tasks such as summarization, classification, and searching. Therefore, this study focuses on identifying nodes and connected edges using a co-occurrence matrix, selecting key points through the use of the TFIDF algorithm and the TextRank algorithm to construct a multilevel network model and to analyze the sentiments of comments.

3. Methodology

3.1. Data Sources and Hierarchy

In the selection of branded goods data, Jingdong Mall, as a large-scale comprehensive e-commerce shopping mall with strong commercial strength in China, has high user influence and a high share of the domestic market; therefore, it was used as a source of data acquisition in this study. In the specific choice of brand category, smartphones were chosen. Smartphones are indispensable in the present day; they are used for many daily functions, including communication and payments. In addition, there are many brands of cell phones with strong competition among them. Generally, people use their cell phones for a long time, so the study of the commodity brands producing cell phones is of great significance. Therefore, this study selected Jingdong Mall’s cell phone brand commodity reviews as a data source for analysis and research. There are many cell phone brands, so it was difficult to account for all of them; therefore, in this study, we selected a relatively representative cell phone brand from Jingdong’s self-owned stores. The commodity attribute and user review information of the cell phone products from the brand was obtained.

The TFIDF algorithm is one of the most commonly used methods for calculating feature weights, which can be used to evaluate the importance of specified words in a whole text. Although this method is widely used, it still suffers from the problem of data sparsity. This method only considers the frequency of occurrence of words in the text and the frequency of inverse documents in the whole corpus; it lacks sufficient consideration of the information that comes before and after keywords. Additionally, this method cannot deal with semantic information in a text, and it does not consider the association between words nor reflect the positional information of words; therefore, it is difficult to effectively reflect the distribution of feature words. The TextRank algorithm is a graph-based ranking algorithm that is usually used for unsupervised keyword extraction without the influence of subjective factors. The algorithm originates from Google’s PageRank algorithm, which utilizes the word co-occurrence information within a document to extract keywords; therefore, it accounts for the text co-occurrence relationship, meaning that it can deal with some polysemous words and synonyms, and can be applied in the analysis of various text types. However, the algorithm also has some drawbacks. For example, it cannot effectively handle the case of multiple topics or multiple intents; at the same time, it is relatively inefficient in processing long texts.

Centrality is a measure of the influence of the nodes in a network that can be used to identify important nodes in different domains, such as social networks, biological networks, urban transportation networks, and epidemiological transmission networks. By combining the centrality of multilevel networks with keyword extraction algorithms, the central keywords in a network can be calculated to determine the central vocabulary of a text. In addition, by observing and comparing the nodes under different centralities, the change in a text’s central information in different situations can be analyzed. This approach helps us to gain a deeper understanding of text data and provides valuable information for decision-making in related fields.

In this study, the selected research object is the text data of a certain brand’s cell phone product reviews; the center of attention in user reviews is mined through keyword extraction in the review text. Under different keyword extraction algorithms, the top-ranked keywords may be different. According to the keywords extracted using different keyword extraction algorithms at the network level of the multilevel network, in addition to the repeated words at the intermediate level, the reviews of cell phones from a certain brand are categorized—according to their algorithmic extraction results—into different parts of the network level: the TFIDF layer, the repeated node layer, and the TextRank layer. After dividing the network level, the keyword nodes in different network levels are calculated according to the centrality of the multilevel network; the differences among the keywords in different centralities are analyzed, in addition to the primary focus among users in relation to the cell phone goods from a certain brand. This method can be used to build an understanding of users’ perceptions and needs for cell phone goods from the brand in order to better understand the market and develop marketing strategies.

3.2. Multilayer Network Construction

3.2.1. Text Preprocessing

Generally speaking, when obtaining text data, the text is mostly unprocessed, and the first step is to perform text preprocessing. The first step of Chinese text preprocessing is Chinese word separation, which is a process of cutting a sequence of Chinese characters into words and then combining the consecutive Chinese characters into a meaningful sequence of words according to certain rules. Unlike in English, Chinese characters do not have spaces, and there is no clear distinction between words, so they are separated and combined into meaningful phrases.

After the initial comments are divided into words, there will be a number of high-frequency auxiliary words, tone words, and punctuation marks that have little meaning and almost no effect on text analysis—these words are called deactivated words. These words are called deactivated words because they cause interference in text analyses, so they are deleted. In addition, words that are weakly connected to the acquired knowledge topics and sentiment information during domain-specific text mining can be added to the deactivated words list and filtered out, which can better serve the efficacy of the sentiment knowledge analysis.

3.2.2. Keyword Extraction

TFIDF is the most commonly used method for calculating feature weights to assess the importance of a specified word in the whole text [23]. TF represents the frequency of the feature word appearing in the whole text—the higher the frequency of a word’s occurrence, the more important it is. IDF is the weighting factor that balances the weight of the words that appear more often in the document [24]. For words in a text, TF and IDF are calculated as follows:

t f_{i, j} = \frac{n_{i, j}}{\sum_{k} n_{k, j}}

(1)

i d f_{i} = \log \frac{|D|}{|\{j : t_{i} \in d_{j}\}| + 1}

(2)

where

n_{i, j}

represents the number of times a word appears in a text;

\sum_{k} n_{k, j}

represents the sum of the numbers of times all words appear in all documents;

|D|

denotes the number of documents;

|\{j : t_{i} \in d_{j}\}|

denotes the number of documents containing the specified vocabulary—to avoid the denominator being 0, it is processed by adding 1.

The TFIDF value is the product of TF and IDF; see Equation:

{t f i d f}_{i, j} = {t f}_{i, j} * {i d f}_{i} = \frac{n_{i, j}}{\sum_{k} n_{k, j}} * \log \frac{|D|}{|\{j : t_{i} \in d_{j}\}| + 1}

(3)

TextRank is an unsupervised graph modeling algorithm based on an improvement on the PageRank algorithm, which converts the keyword extraction problem into an importance ranking problem of words [25]. The algorithmic idea of TextRank is as follows: firstly, the document is partitioned into words, and each partition is regarded as a network node; some kind of connection existing between the nodes is taken as the edge, which constitutes the graph model. The model is represented by

G = (V, E)

, node V represents the words and E represents the relationship between the words. For a given node, the more nodes with which there is a connection, the more important the node is. The traditional TextRank algorithm node importance score is calculated as shown in Equation:

W S (v_{i}) = (1 - d) + d * \sum_{v_{j} \in I n (v_{i})} \frac{w_{j i}}{\sum_{v_{k} \in O u t (v_{j})} w_{j k}} W S (v_{j})

(4)

where

d

is the damping coefficient, which usually takes the value of 0.85;

1 - d

denotes the probability that a particular node randomly jumps to a neighboring node. For a particular node,

v_{i}

,

I n (v_{i})

is the set of points pointing to the node

v_{i}

,

O u t (v_{j})

is the set of points cast outward from node

v_{j}

, and

w_{j i}

denotes the weight of the neighboring node,

v_{j}

, pointing to

v_{i}

.

3.2.3. Centrality Analysis

In order to reveal the importance and influence of the extracted keywords in a more in-depth way, and to further study the focus of users’ attention on branded products, this study chooses the four metrics of degree centrality, proximity centrality, median centrality and eigenvector centrality to analyze. Among them, degree centrality is a direct metric to portray the centrality of nodes, which intuitively reflects the frequency of keywords being mentioned; median centrality portrays the importance of nodes through the number of shortest paths passing through a node, and nodes with high median centrality play the role of a “bridge”, which is the key to connecting different topics or concepts; proximity centrality reflects the importance and influence of a particular node in the network. Centrality reflects the proximity between a node and other nodes in the network, if a keyword has high proximity centrality, it indicates that it can be quickly associated with multiple other keywords; feature vector centrality is designed to explore those keywords that are connected to multiple important nodes, which also have higher importance. The formulas and derivation process for the four centralities are as follows:

Degree centrality is a quantitative measure of the importance of a node in terms of the number of nodes directly connected to the node; this is very simple to compute, but suffers from the problem of not being able to accurately assess the importance of a node [26].

{D C}_{i} = \frac{k_{i}}{N - 1}

(5)

where ki is the number of edges connected to node i and N-1 is the number of edges connected to node i with all other nodes.

Median centrality is a quantitative measure of the importance of a node based on the number of shortest paths between all pairs of nodes in the network that pass through a given node; this is a metric with global characteristics and very high time complexity [27].

{B C}_{i} = \sum_{s \neq i \neq t} \frac{n_{s t}^{i}}{g_{s t}}

(6)

where

n_{s t}^{i}

denotes the number of paths that pass through the shortest path of the node and

i

denotes the number of shortest paths that connect nodes

g_{s t}

and

s

. Normalizing the above equation, we obtain

{B C}_{i} = \frac{1}{(N - 1) (N - 2) / 2} \sum_{s \neq i \neq t} \frac{n_{s t}^{i}}{g_{s t}}

(7)

Proximity centrality is a quantitative measure of the importance of a node by determining its centrality in the network; this is reflected by the inverse of the sum of the shortest paths from a node to the other nodes in the network, and the metric is very dependent on the topology of the network [27].

{C C}_{i} = \frac{1}{d_{i}}, d_{i} = \frac{1}{N - 1} \sum_{j = 1}^{N} d_{i j}

(8)

where

d_{i}

is the average distance from node

i

to other points; the reciprocal of the average distance is the proximity centrality.

Eigenvectors are used to assess the importance of a node from a realistic point of view; they can be used when considering the importance of each node as a linear combination of the importance of the other nodes in the network. Accordingly, one can obtain a system of linear equations. Solving this system of linear equations, one can obtain the maximum eigenvalue, whose corresponding eigenvector is the position of the node in the network [28].

E C = x_{i} = c \sum_{j = 1}^{N} a_{i j} x_{j}

(9)

where

c

is a constant of proportionality;

a_{i j} = 1

when only

i

and

j

are connected, and 0 otherwise.

3.2.4. Multilevel Network

Multilayer networks constitute complex topological architectures formed by interconnected layers, which integrate multidimensional information to enable the holistic analysis of research subjects. The construction process comprises the following key steps.

First, the research object is defined as a knowledge network constructed from brand product reviews, with the research objective focused on identifying consumers’ primary concerns regarding the target brand. This necessitates the collection of product review data through web crawlers, followed by preliminary data cleaning and processing to mitigate interference from invalid characters, thereby ensuring data accuracy and reliability. The specific implementation workflow is outlined as follows:

Then, determine the network hierarchy: Identify the hierarchical structure of the multilayer network based on the research subject and objective. For example, in social networks, it can be divided into personal, group, and community levels. At the same time, define the nodes and edges in the network, along with their specific meanings in each layer. In a social network, nodes may represent individuals or groups, while edges can denote following or interaction relationships.

Next, establish a multilayer network model: A multilayer network model is constructed using the collected data and the defined hierarchical structure. Tools such as network analysis software (e.g., Gephi 0.10, Pajek 6.01) can be employed for its construction and analysis.

Finally, analyze and interpret the results: The structure and characteristics of the network model should be analyzed with a focus on key metrics such as degree centrality and clustering coefficients. The significance of these metrics and the factors influencing them should be interpreted to obtain valuable insights and provide references for subsequent research on the application of knowledge discovery in sentiment analysis.

Then, given that computations cannot directly process Chinese text comments, these linguistic data must be transformed into matrix vector representations. This transformation enables the computation of topological properties within the multilevel network and facilitates a more effective analysis of branded merchandise comment data. The specific implementation process is conducted as follows:

Construct an intralayer adjacency matrix. A multidimensional, multilayer network (brand knowledge multilayer network, BKMN) based on branded product reviews is built by connecting branded products according to the keyword associations within sentences. This network illustrates the relationships between keywords in product reviews. The interlayer adjacency matrix reflects the associations of co-occurring keywords in branded product reviews under various algorithms, revealing users’ common knowledge and concerns about brand product knowledge elements across different keyword extraction algorithms.

Constructing the hyper-neighborhood matrix: In a multilevel network, the intralevel matrix is depicted via diagonal lines, while interlevel matrices are represented by upper and lower triangles. This matrix representation captures the characteristics and connections of the multilevel network structure. The matrix representation of a multilevel network is shown in Equation (10):

M = (\begin{matrix} W_{11} & \dots & D_{1 L} I \\ ⋮ & ⋱ & ⋮ \\ D_{L 1} I & \dots & W_{L L} \end{matrix})

(10)

The network architecture consists of two primary strata: a keyword stratum extracted from brand reviews using TF-IDF and another stratum derived through TextRank extraction. The intersection of keywords from both strata constitutes an intermediate bridging stratum. In the mathematical notation,

W

represents the intralayer adjacency matrix where the subscript denotes the network stratum;

D

indicates the interlayer adjacency matrix with subscripts specifying coupled strata. For experimental parameters,

D

functions as the interlayer weight matrix and

I

denotes the identity matrix.

Considering the different keyword extraction algorithms in brand product review data, we can obtain the review data of a certain brand product and identify the node attributes and edge connections in the review data based on the actual needs of knowledge discovery. The specific steps are as follows:

In the brand product evaluation text, we perform Chinese word segmentation to extract words and remove some stop words that affect the analysis. The remaining words are defined as nodes in the multilayer network. In brand-related research, high-frequency keywords reflect the hotspots or issues mentioned by most users, while low-frequency keywords indicate the sporadic situations of products rarely mentioned by users. Therefore, we extract the top-ranked keywords in terms of frequency from the text and use them as nodes in the multilayer network. The edges between nodes within the same layer are established based on the connections between brand product evaluation sentences, forming a brand keyword knowledge network. If there is a connection, it is represented as 1; if there is no connection, it is represented as 0.

The construction process of a multilayer network for a certain brand’s mobile phone product reviews is implemented as follows: ① Node identification—review-keywords from the brand product dataset are identified as nodes, utilizing two distinct keyword extraction algorithms (TF-IDF and TextRank) to extract keywords from consumer reviews. ② Edge relationship identification—co-occurrence relationships between keywords appearing in different brand product review texts are identified. ③ Network layer division—the brand product review keywords are classified into two categories based on the different algorithms to construct two keyword network layers, thereby analyzing consumer preferences toward the brand products from two perspectives.

The co-occurrence matrix is utilized to ascertain the keyword nodes and their co-occurrence frequencies (node weights) within the product reviews of a cell phone brand; the edges and weights linking the different nodes are also identified. To demonstrate this, the weights of selected keywords for a cell phone brand are listed in Table 1, while the weights of the associated edges are displayed in Table 2.

A keyword multilayer network is established for brand product reviews: For a specific cell phone brand, the TF-IDF algorithm is employed to select the top 100 keywords and form a network layer termed the TF-IDF layer. Simultaneously, the TextRank algorithm is utilized to identify the top 100 keywords; the edges from the co-occurrence matrix are used to construct another network layer, designated as the TextRank layer.

These steps facilitate the construction of a multilayer network model encompassing TF-IDF, TextRank, and repeat node layers. This model provides in-depth insights into consumer focus and emotional tendencies toward the brand’s mobile phones, thereby strongly supporting the formulation of market strategies for the brand.

3.3. Hierarchical Classification of Brand Product Reviews

In the realms of information retrieval and text mining, the TF-IDF algorithm stands as a quintessential methodology that has been widely applied. It assesses the significance of particular terms within a corpus through the computation of the product between term frequency (TF) and inverse document frequency (IDF). Though it demonstrates superior performance in lexical weight quantification, TF-IDF’s intrinsic data sparsity constraints impede its efficacy when handling complex text analysis tasks. The algorithm’s primary dependence on lexical frequency metrics disregards contextual interdependencies, semantic correlations, and positional information, resulting in deficiencies in capturing deep semantic structures. To address TF-IDF’s constraints, scholars developed the TextRank paradigm employing graph-based ranking architectures. The TextRank algorithm, a natural language processing technique based on a graph ranking model, draws its core concept from the PageRank algorithm. By constructing a lexical co-occurrence network and iteratively computing node importance scores, it enables the automatic extraction of keywords from texts. This methodology efficiently capitalizes on the co-occurrence relationships within texts, addresses the ambiguity of polysemous and synonymous words, and demonstrates remarkable robustness and extensive applicability. Nevertheless, the TextRank algorithm encounters efficiency bottlenecks when processing multi-topical texts and lengthy documents, while its singular network structure encounters difficulties in effectively differentiating vocabulary pertaining to diverse topics or intentions. Although TextRank surpasses TF-IDF in dealing with semantic relationships, it remains incapable of fully grasping the intricate structure of texts. Consequently, over the past few years, researchers have initiated the exploration of incorporating the centrality concept from complex network analysis into keyword extraction tasks. Centrality analysis, a crucial instrument in complex network research, offers an effective means of quantifying node influence within networks. In the field of text analysis, the combination of multilayer network centrality with keyword extraction techniques can be used in the construction of a hierarchically organized text representation model. Through the comparative analysis of node attributes across diverse centrality indices, this approach uncovers core information evolution patterns across various contexts. It not only deepens our understanding of textual data but also provides new perspectives for cross-domain decision support.

Building upon these foundations, we introduce a hybrid keyword analysis framework integrating TF-IDF, TextRank, and multilayer network centrality, subsequently applying this methodology in the empirical analysis of consumer reviews for a specific mobile phone brand. Utilizing user-generated reviews pertaining to a specific mobile phone brand as our analytical corpus, we employ advanced keyword extraction methodologies to uncover salient consumer concerns. In view of the differences in vocabulary ranking among different keyword extraction algorithms, we propose a keyword-analysis method based on multilayer network centrality. Specifically, the keywords extracted using the TF-IDF and TextRank algorithms are used as different layers of the network, respectively, and the repeatedly occurring vocabulary constitutes the intermediate nodes connecting each layer. By constructing such a multilayer textual network model and computing centrality metrics for each layer’s nodes, an in-depth examination of the varying significance of keywords within different network architectures can be conducted, thereby uncovering users’ central points of interest regarding the brand’s mobile phones. This research not only helps to gain insight into consumer demand but also provides a data-driven decision-making basis for the brand’s market strategy formulation. In summary, we integrate multiple methods to build a more comprehensive and in-depth text analysis framework, offering a powerful tool for use in building an understanding of user reviews and devising marketing strategies.

3.4. Sentiment Analysis

3.4.1. Evaluation Indicators

In this study, for the task of sentiment analysis in knowledge discovery, we classify the sentiment tendency of cell phone brand product reviews and propose a Mul-LSTM model that is applicable in the classification of the sentiment tendency of cell phone brands.

The following four metrics are used for the classification metrics of Mul-LSTM:

Accuracy, which indicates the ratio of the number of correctly classified test samples to the total number of test samples, is calculated by the following formula:

A c c u r a c y = \frac{T P + T N}{T P + F N + F P + T N}

(11)

Precision, which indicates the number of correctly categorized positive cases as a proportion of those categorized as positive, is calculated as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(12)

Recall indicates the ratio of the number of correctly categorized positive cases to the actual number of positive cases and is calculated by the following formula:

R e c a l l = \frac{T P}{T P + F N}

(13)

The F1-score, which represents the reconciled average of recall and precision, is calculated as follows:

F 1 - s c o r e = \frac{2 * R e c a l l * P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(14)

where TP denotes a positive example of correct categorization, TN denotes a negative example of correct categorization, FP denotes a positive example of misclassification, and FN denotes a negative example of misclassification.

In branded goods evaluation, the obtained reviews are categorized by determining the positive and negative sentiments of the user’s branded goods evaluation based on sentiment analysis.

3.4.2. Categorization and Analysis of Users’ Emotional Knowledge

Figure 1 represents the comment text sentence input transformed into word vectors in a multilayer LSTM.

The figure shows the architecture of a deep learning-based sentence classification system, which consists of two main modules: sentence feature extraction (vectorization) and deep neural network classifier. In the sentence feature extraction module, the original sentence (input utterance) is first converted into multiple word vectors by word embedding (word embedding process). Subsequently, with the help of Long Short-Term Memory (LSTM) network, these word vectors are integrated into sentence vectors, thus completing the feature vectorization process. The obtained sentence vectors are passed into the deep neural network classifier as inputs, which are computed and processed by the input nodes and hidden nodes in turn, and finally the classification results are output by the output nodes to realize the sentence classification task. An LSTM can well express the relationships among words in a sentence, enabling a view of the sentence as a whole. The structure of the multilayer LSTM is drawn with the plot_model function and is shown in Figure 2.

Since the embedding_1_input layer accepts a sequence of integers, the index needs to be converted into a distributed word vector within the model; in this study, this is accomplished through using the embedding layer, which transforms positive integers into vectors with fixed dimensions.

Considering the time resources spent on computation, a three-layer LSTM is designed here to map the vocabulary into the vector space by passing all the words through the embedding layer; here, 20 is the size of the vector space in which the words are embedded. For each word, it defines the size of the output vector of the layer. Here, 200 denotes 200 consecutive features. The features are fed into the LSTM, where there are three LSTM layers. The output of the first layer is 100, the output of the second layer is 32, and the output of the third layer is 16. The input of each LSTM layer is the output of the previous layer. The dropout layer between the LSTM layers discards part of the information; the purpose of this is to prevent the model from overfitting and improve the model’s generalization ability; the probability is set to 0.5. The dense layer’s output is two-dimensional, indicating the binary classification of the review data.

The text of a cell phone brand review obtained from the Jingdong platform is processed as the experimental dataset of this study. The sentiment labels are labeled positively and negatively based on manual labeling. The Mul-LSTM model proposed in this study is used to classify the sentiment of this Apple cell phone review dataset in conjunction with the Mul-LSTM model.

3.4.3. Embedding Layer Parameters

Firstly, the acquired data of more than 20,000 Apple cell phone product reviews are divided into 10 copies: 7 copies are used as model training samples, and 3 copies are used for testing and validating the model effect. The embedding layer is constructed to transform the input discrete features into continuous features and output them to the next layer. The specific settings are shown in Table 3.

Here, input_dim is the discrete feature input size, output_dim is the continuous feature output size, input_length is the input sequence length, and mask_zero determines whether or not the value “0” of the input should be considered as a special “padding” value; this is mainly used in dealing with the variable-length inputs at the recurrent network layer.

3.4.4. Mul-LSTM Model Parameters

The features input to the Mul-LSTM model are limited to 200, and the batch_size parameter is set to 64 to maximize the effect of the model. To prevent overfitting, a dropout layer is added; the three-layer dropout value is set to 0.5. The number of neurons in the hidden layer is set to 64. The epoch is set to 20 rounds. The Mul-LSTM model is built to complete the training of the classification model; the parameter settings are shown in Table 4.

4. Results

4.1. Nodal Identification of Brand Product Reviews

In this study, we collected 19,875 consumer reviews across 11 smartphone product series of a specific brand through web crawling on the JD.com platform. Following text segmentation processing, the top 100 keywords were extracted through the combined implementation of TF-IDF weighting and TextRank graph-based ranking algorithms. The results are presented in Table 5.

By analyzing the extraction results of the TF-IDF and TextRank algorithms, we were able to identify key user concerns regarding the selected brand’s smartphone offerings. Under the TF-IDF algorithm, the top ten keywords are ranked as follows: phone, speed, screen, photography, operation, appearance, effect, sound effect, shape, and like. Under the TextRank algorithm, the top ten keywords are ranked as follows: screen, phone, photography, sound effect, appearance, standby time, speed, very, operation, and battery. A comparison reveals that, although there is a certain degree of overlap between the keywords extracted using the two algorithms—such as “phone”, “screen”, “photography”, “appearance”, and “operation”—there are differences in the importance ranking of the keywords. For example, “speed” is ranked higher using TF-IDF but is ranked relatively lower using TextRank; conversely, “sound effect” ranks higher using TextRank but is not ranked higher when using TF-IDF. This indicates that different algorithms have different assessments of the importance of words in text, possibly due to the different factors they consider when calculating word weights. Based on this, we can regard the keywords extracted using the two algorithms as network nodes and construct edges based on co-occurrence relationships, thereby determining the intralayer connections of TF-IDF and TextRank, respectively. Interlayer connections are made based on the principle of node name identity, ultimately constructing a multilayer network model comprising the TF-IDF layer, TextRank layer, and repeated node layer.

On e-commerce platforms like JD.com, consumers can review purchased products, and these reviews form the main content of online evaluations. As the primary medium of information dissemination, review content serves as a conduit for transmitting actionable market intelligence—including consumer preferences and product performance metrics—to stakeholders such as prospective buyers, retailers, and brand custodians. To deeply explore consumer sentiment and product feature information embedded in review data, this section builds a multilayer network model based on brand product reviews, keyword extraction algorithms, and feature associations. In this model, the association rules between nodes are as follows:

(1) TF-IDF network layer: the nodes represent TF-IDF-derived keywords, with the edges constructed based on co-occurrence relationships within individual smartphone reviews.

(2) TextRank network layer: the nodes correspond to TextRank-generated keywords, with the edges defined by lexical co-occurrence within singular review instances.

(3) Shared node network layer: the nodes consist of intersectional keywords identified by both TF-IDF and TextRank, with the edges reflecting cross-product lexical associations within the brand’s review corpus.

Within multilayer network architectures, node influence is determined by three interdependent factors: the aggregated degree centrality of both focal nodes and their intralayer counterparts, the betweenness centrality quantified by path traversal frequency, and the systemic dominance of specific node types within heterogeneous networks proportional to their population density. Intralayer node interactions and interlayer connectivity collectively characterize the heterogeneous attributes of nodes within multilayered network frameworks. In sentiment–feature co-analysis, the lexical–semantic overlap of multi-domain keywords within consumer reviews induces cross-layer semantic bridging, thereby dynamically constructing interlayer edges in multi-modal network architectures. Intralayer nodes across distinct network strata are interconnected via three primary mechanisms—review co-occurrence patterns, feature keyword co-occurrence distributions, and sentiment affinity metrics—which are synthetically integrated into a multilayered analytical framework for brand sentiment intelligence mining. This model yields granular insights into consumer sentiment polarity and domain-specific attention allocation patterns toward the brand’s mobile devices, which serve as actionable data assets to inform evidence-based strategic decision-making in competitive market positioning.

4.2. Multilayer Network Analysis of BKMN

Firstly, according to the identified keyword nodes in Section 4.1, among the keywords extracted by the TFIDF algorithm from the review data of a cell phone in Jingdong, the duplicated keyword nodes extracted by the TFIDF and TextRank algorithms are removed, and based on the centrality computation method in Section 3.2.3, used to compute the four centrality scores of the remaining lower nodes, DC, BC, CC and EC, which are ranked. After that, the histograms of different centrality methods are plotted separately, as shown in Figure 3.

From the four kinds of centrality in the figure, we can see that “operation”, “performance” and “standby” are at the top of both degree centrality and proximity centrality, indicating that these three keywords co-occur most frequently with other words and can be quickly associated with other topics, and are the core functional attributes discussed by users; the meso-centrality of “operation” stands out, indicating that it is the link between “performance” and “system”. This indicates that these three keywords have the highest frequency of co-occurrence with other words and can be quickly associated with other topics, which are the core functional attributes discussed by users; the meso-centrality of “operation” stands out, indicating that it is the key mediator connecting different functional topics such as “performance” “system” is a key mediator connecting different functional topics such as “performance” and “system”; the eigenvector centrality of “operation” and “performance” is high, as they are often associated with other high-impact nodes such as “speed”, further reinforcing their importance. The feature vectors of “operation” and “performance” have higher centrality because they are often associated with other high-influence nodes such as “speed”, which further strengthens their centrality at the statistical feature level. In addition, the colors of cell phones that consumers pay attention to are concentrated in black and green, the keyword “girls” reflects the high influence of the brand on girls, and the keyword “hope” reflects the expectation of consumers for the brand. Overall, this reflects the high frequency of users’ attention to the hardcore indicators of cell phones.

Next, using the same method, based on the keyword nodes identified in Section 4.1, among the keywords extracted by TextRank algorithm from the review data of a cell phone in Jingdong, the duplicated keyword nodes extracted by the TFIDF and TextRank algorithms are removed, and based on the centrality computation method in Section 3.2.3, the remaining nodes’ four centrality scores, namely, DC, BC, CC, and EC, are calculated and ranked. These are then sorted and plotted as bar charts for different centrality methods, respectively, as shown in Figure 4.

As can be seen from the four centrality of the graph, the centrality of the top-ranked nodes is of the same size, indicating that in the keywords extracted by this algorithm to build into the network layer, the top-ranked nodes have the same influence, and the probability of consumers mentioning them in their comments is the same. The degree centrality and feature vector centrality of “delicate”, “comfortable” and “clear” are in the front, reflecting the fact that these descriptive words are in the semantic context with “appearance”, “feel” and other important nodes that are closely related, which is the core expression of the user’s subjective experience, indicating that consumers have a better sense of the brand’s goods; the meso-centrality of “12mini” is significant as a link between “small-screen” and “clear”. The centrality of “12mini” is significant, as an intermediary connecting topics such as “small screen” and “battery life”, and becoming a hub for the discussion of specific models, indicating that consumers are more concerned about the 12mini series of cell phones, and are also more concerned about the size and usage time of the cell phones under the brand; close to the centrality of high “clear” can be quickly associated with visual related topics such as “screen” and “photo”. Overall, it reflects users’ semantic focus on usage experience and segmented models, and forms a complementary perspective of function and experience with the TF-IDF layer.

Secondly, using the TFIDF and TextRank algorithms to extract repetitive keywords from Jingdong Apple cell phone review data, we were able to construct a repetitive node network hierarchy. The results show that the centrality of the first ten keyword nodes is the same; this indicates that the two algorithms have the same influence and control in sorting the top ten nodes in the network, reflecting strong control over the network information flow. In using the two keyword extraction algorithms, combined with centrality, it can be concluded that the photo effect, running speed, screen, appearance, shape, and sound effects comprise the central focus in consumer reviews.

Finally, the multilevel network is constructed using a TFIDF layer, a TextRank layer, and an overlap layer (overlapping keywords layer); these layers improve the visual effect of the network. The edges with lower weights at the connected edges in the network layer are deleted, with focus on the nodes that are more closely connected. The sizes of the nodes are related to the degree of the multilevel network—the larger the degree, the larger the nodes. These data were plotted using Python 3.13.5 programming and toolkits such as pymnet, matplotlib, networkx, and pandas; the plotted multilevel network is shown in Figure 5. This network is used to show the connections between the keywords at different network levels, with different weighted connected edges and nodes. Since the network is based on Chinese corpus analysis, the nodes in the graph are Chinese words.

In the BKMN network, the overlap layer exhibits the highest node degrees, followed by the TextRank layer, with the TF-IDF layer showing the lowest node degrees. The key consumer concerns across the layers focus on “photography”, “performance”, “speed”, “audio quality”, “design”, and “battery life”. Among consumers who recently purchased and reviewed a mobile phone, there is a relatively high level of attention paid towards the iPhone12 mini.

To ensure that the plotted keywords do not overlap, and to enable us to consider the number of overlapping keyword nodes among the two algorithms, 60 keyword nodes were selected according to their degree of centrality and sorted in order to plot the four centrality trends, as shown in Figure 6.

This selection reveals that the first 15 keywords in the BKMN multilevel network are the focal points of consumer attention in brand reviews. The degree centrality curve shows a pattern similar to that of betweenness centrality, whereas the curves for closeness and eigenvector centralities exhibit relatively minor fluctuations.

A heatmap, integrated with the FR force-directed algorithm for node layout, unveils an overall three-layer network structure, as depicted in Figure 7.

The color distribution within the BKMN multilayer network illustrates node distribution under different centralities. Most nodes exhibit low betweenness centrality, yet they possess high degree, closeness, and eigenvector centralities. Few nodes with high betweenness centrality exist in the network, indicating that only a small number of keywords act as the shortest-path nodes in branded product reviews. These nodes perform a vital “mediating” function in information transmission and semantic expression.

The characteristics of the multilayer network facilitate a more comprehensive and in-depth understanding of the network structure and the relationships between nodes. By analyzing the network topology features of each individual layer as well as the overall multilayer network, the results presented in Table 6 were obtained.

The structural hierarchy analysis reveals that 60 intersectional nodes were co-identified by both the TF-IDF and TextRank algorithms. These nodes exhibit cross-algorithmic dominance in multilayered network architectures, signifying consensus-driven consumer focus points within brand-related review ecosystems.

A comparative analysis of the average degree centrality metrics indicates that the network layer derived from TextRank surpasses its TF-IDF counterpart in edge density. However, the shared node subgraph shows reduced connectivity. The BKMN multiplex network demonstrates the highest global degree centrality, reflecting a high level of node interconnectivity across hierarchical layers. This structural pattern suggests that TextRank prioritizes semantically cohesive keyword clusters, while the heterogeneity in degree distributions across algorithms arises from differential weighting mechanisms.

An analysis of the network diameter indicates that both the individual layers and the overall multilevel network have a diameter of 2. This signifies that the maximum distance between any two keyword nodes is at most two steps, reflecting the close proximity of consumer attention points. Additionally, the average path length within each network layer remains below 2, suggesting that the multilevel network exhibits small-world properties. This facilitates the more effective observation of node propagation and influence.

An analysis of the normalized clustering coefficient reveals that the average clustering coefficient of the multilevel network exceeds 0.8, with each layer also exhibiting values above 0.8. This indicates that, despite variations in the co-occurrence levels of keywords extracted from product reviews across different algorithms, the keywords consistently co-occur with high frequency and exhibit strong interconnections. Consequently, consumer attention within the mobile phone product category of a given brand appears to be tightly clustered.

4.3. Brand Sentiment Analysis Based on Multilayer Networks

4.3.1. Mul-LSTM Result Analysis

The loss metric measures the error between the model’s predicted results and the actual labels. In order to better observe the model results and ensure the scientific validity of the conclusions, this study plots the loss curves in 20 rounds of epochs based on the experimental results of model fitting, as shown in Figure 8.

It can be seen that the loss of the TRAIN training set shows a continuous and smooth downward trend, indicating that the model’s fitting effect on the training data gradually improves, and the learning process is normal on the whole, which can effectively capture the emotional feature patterns in the training set. The loss of the TEST test set always fluctuates around 0.18 and does not synchronously decline with the loss of the training set. This stability may reflect that the generalization ability of the model on the test data is in a stable state, which may be due to the fact that the model did not show obvious overfitting during the training process, and the small fluctuation of the test set loss may be related to the distribution of the data, and there may be differences in the distribution characteristics of different batches.

The results of the classification metrics calculated according to the Mul-LSTM proposed in this paper are shown in Table 7.

In the validation set, there are 3364 positive and 2995 negative comment texts; in the table, “0” stands for negative comments and “1” stands for positive comments. The number of positive and negative samples is inconsistent; there are slightly more positive samples than negative samples, but the difference between the numbers of samples is not large. The F1-score, the average of recall and precision, is 0.93; the macro avg, the average of all categories, is 0.93; the weight avg, the average weight value, is 0.93. The sentiment binary categorization results for the data from the Apple cell phone user reviews are more accurate.

The results show that the Mul-LSTM model proposed in this study for sentiment analysis classification improves the performance of the pre-trained language model on the cell phone brand dataset to a certain extent; there is a certain improvement in the effect of the positive and negative sentiment bi-classification of commodity reviews. This, in turn, shows that the experimental classification results of the Mul-LSTM approach proposed in this study are effective for use when performing knowledge discovery for commodity reviews for a given brand.

4.3.2. Comparison of Experimental Results

The performance of the Mul-LSTM model proposed in this study was compared with those of machine learning algorithms such as the SVM algorithm, the KNN algorithm, the Adaboost algorithm, the Xgboost algorithm, and a single-layer LSTM model; the goal was to compare the classification effect of different algorithms on cell phone brand reviews. The results of the comparison are shown in Table 8.

In this study, we conducted sentiment classification experiments on the processed cell phone brand product review dataset, selected eight sentiment binary classification algorithms for knowledge discovery and classification, performed analyses using ten-fold cross-validation, selected macro-averaging, and sought the arithmetic mean for all classes. Comparing the results of the above eight experiments, the single-layer LSTM has the lowest accuracy, at 0.4594; this is followed by the KNN algorithm with 0.5087. The SVM, decision tree, and random forest algorithms are in the middle–upper level of accuracy, all of which achieve values greater than 0.8 and less than 0.9. The accuracies of Adaboost and Xgboost are higher, respectively, at 0.9094 and 0.9177; meanwhile, the accuracy of the Mul-LSTM approach proposed in this study is 0.9308. This reflects an improvement of about 0.2 compared to the results achieved using the above algorithms, with better classification results.

5. Conclusions

This study aims to develop a multilayer-network-based model for brand knowledge discovery. Data collection was initiated through web crawler technology to gather product review datasets for a specific mobile phone brand on JD.com. Following this, a co-occurrence matrix was constructed to identify nodes and edges within the reviews. The TF-IDF and TextRank algorithms were independently applied to extract keywords, which were subsequently designated as two distinct network layers. Duplicate nodes identified through both algorithms were integrated as an interconnecting intermediate layer, thereby establishing a three-layer multilayer network structure. After that, a multi-level network analysis of brand reviews was conducted to discover the central keywords of product reviews under a brand through different centrality analyses, and the semantic central knowledge of product reviews under the brand was mined by reordering according to the centrality. Finally, the Mul-LSTM sentiment analysis model was constructed to perform sentiment analysis on brand review texts to mine consumers’ attitudes towards the brand.

The study found that, for this brand—according to the information that received the most attention in the consumer comments between the three layers of the network—consumers are most concerned about the photo function, running speed, sound quality, appearance, aesthetics, and battery life of their phone when making a purchase decision, among other factors. Companies can optimize the hardware configuration of their products accordingly (e.g., they could upgrade the camera sensor and increase the battery capacity) and strengthen the marketing of the “night shooting”, “all-day battery life”, and other selling points. These network-topology-based insights reveal more implicit needs than traditional word frequency analyses—for example, the strong concatenation of “photo” with “effect” and “pixel” (TextRank layer concatenation weight 0.468) suggests that users are interested in the overall experience of image quality rather than a single parameter. Through the multilayer network centrality ranking, enterprises can quantify the influence differences of different attributes. For example, in brand reviews, the degree centrality of “operation speed” is higher than that of “color”, which implies that performance optimization is more effective in improving brand reputation, and priority should be given to allocating R&D resources. In terms of sentiment analysis, this study evaluates the sentiment binary classification effect of the Mul-LSTM model proposed in this study based on the classification comment indexes. Through the adjustment of parameters, the final binary classification effect of the model can reach 93%; moreover, the positive and negative sentiment classification effects of the comments are better. Upon comparison with other algorithms, such as the SVM and decision tree algorithms, the performance of the Mul-LSTM model proposed in this study is better in analyzing the cell phone brand review data. The findings highlighted that there were slightly more positive reviews than negative reviews, indicating that users have an overall positive attitude towards this brand.

This study has some shortcomings. In terms of modeling, in this study, we identified nodes from the perspectives of the TFIDF and TextRank algorithms; the repetition layer integrates the characteristics that the two algorithms consider, but both TF-IDF and TextRank rely on word frequency statistics, resulting in the insufficient capture of low-frequency but key attributes in the model. Furthermore, the repetition layer is susceptible to the influence of high-frequency words. Future research could consider using other keyword extraction algorithms, increasing the number of network layers in order to capture deep semantic associations, and then improving the model’s ability to capture the implicit needs of the brand to improve the model’s effect, such as by obtaining knowledge graphs or using network recommendation methods. In terms of research content, this paper only constructs a multilevel network model based on brand knowledge discovery and analyzes it for a brand’s review data, but the indexes of the test set do not reach the optimum in the results of the sentiment analysis, and the future research can either optimize the model, validate it with a better dataset, or further study the specific influence mechanism of positive and negative brand evaluation on brand reputation.

The theoretical and practical implications that emerge from this work can be summarized as follows: Theoretically, the present research represents a pioneering approach in methodological innovation through transcending conventional single-layer network analysis. The BKMN framework harmonizes TF-IDF and TextRank through duplicate node mediation, resolving algorithmic discrepancies while establishing a novel paradigm for multilayer network modeling in brand analytics. Sentiment analysis methods are also optimized for research related to brand knowledge discovery. Practically, the three-layer network analysis framework and modeling ideas constructed in this study have a certain degree of universality, and after parameter tuning, they can be extended to the knowledge discovery research in the fields of household appliances, beauty and other brands, and the conclusions of the study can provide precise directions for the product iteration of enterprises, discover the user’s emotional bias towards the branded products, adjust the marketing strategy and product selling points in time, and provide data support for the enterprise’s precise marketing and product promotion programs.

Author Contributions

Conceptualization, P.X. and R.Z.; methodology, P.X.; software, R.Z.; validation, Z.W. and Z.S.; formal analysis, R.Z.; investigation, P.X.; resources, P.X.; data curation, R.Z.; writing—original draft preparation, P.X.; writing—review and editing, R.Z.; visualization, R.Z.; supervision, Z.S.; project administration, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

References

Maulidi, A. Cost efficiency and green product innovation in SMEs for emerging economies: The roles of green Brand knowledge and green innovation capability. J. Clean. Prod. 2025, 498, 145130. [Google Scholar] [CrossRef]
Tracie, T.; Franck, V. Effects of brand knowledge on green trust and green brand equity: Multigroup comparisons based on perceived brand greenness and age. J. Fash. Mark. Manag. Int. J. 2024, 28, 837–857. [Google Scholar]
Yan, B.; Tao, Z.; Lin, S.; Li, H. Multi-level feature fusion parallel branching networks for point cloud learning. Comput. Graph. 2025, 128, 104221. [Google Scholar] [CrossRef]
Comas, S.D.; Meschino, J.G.; Ballarin, L.V. Interval-valued fuzzy predicates from labeled data: An approach to data classification and knowledge discovery. Inf. Sci. 2025, 707, 122033. [Google Scholar] [CrossRef]
Fayyad, U.M.; Piatetsky-Shapiro, G.; Smyth, P. From Data Mining to Knowledge Discovery in Databases. Ai Mag. 1996, 17, 37–54. [Google Scholar]
Cleopatra, V.; Estefania, B. A typology of brand knowledge associations projected in brand-generated signals. J. Prod. Brand Manag. 2025, 34, 376–397. [Google Scholar]
Lu, R.S.; Tsao, H.Y.; Lin, H.C.K.; Ma, Y.C.; Chuang, C.T. Sentiment analysis of brand personality positioning through text mining. J. Inf. Technol. Res. (JITR) 2019, 12, 93–103. [Google Scholar] [CrossRef]
Joyce, B.; Deng, J. Sentiment Analysis Using Naive Bayes Approach with Weighted Reviews-A Case Study. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar]
Wu, Q.; Kuang, Y.; Hong, Q.; She, Y. Frontier knowledge discovery and visualization in cancer field based on KOS and LDA. Scientometrics 2019, 118, 979–1010. [Google Scholar] [CrossRef]
Li, C.; Xie, Z.; Wang, H. Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks. Appl. Sci. 2025, 15, 5102. [Google Scholar] [CrossRef]
Anisimov, V.A.; Marchenko, O.O.; Nasirov, M.E.; Taranukha, V.Y. Comparative Analysis of Neural Network Models for Text Classification Problems. Cybern. Syst. Anal. 2025, 61, 1–8. [Google Scholar] [CrossRef]
Wang, Y. Research on the TF–IDF algorithm combined with semantics for automatic extraction of keywords from network news texts. J. Intell. Syst. 2024, 33, 20230300. [Google Scholar] [CrossRef]
Liu, W.; Sun, Y.; Yu, B.; Wang, H.; Peng, Q.; Hou, M.; Guo, H.; Wang, H.; Liu, C. Automatic Text Summarization Method Based on Improved TextRank Algorithm and K-Means Clustering. Knowl.-Based Syst. 2024, 287, 111447. [Google Scholar] [CrossRef]
Erdős, P.; Rényi, A. On the evolution of random graphs. Am. J. Ind. Bus. Manag. 1960, 5, 17–60. [Google Scholar]
Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’ networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef]
Barabási, A.L.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef] [PubMed]
Wu, Z.N.; Di, Z.; Fan, Y. The Structure and Function of Multilayer Networks: Progress and Prospects. J. Univ. Electron. Sci. Technol. China 2021, 50, 106–120. [Google Scholar]
Zhang, K.; Li, H. Key Nodes Mining in Complex Networks Based on Improved Pagerank Algorithm. In Proceedings of the 2022 IEEE/ACIS 22nd International Conference on Computer and Information Science (ICIS), Zhuhai, China, 26–29 June 2022; pp. 265–271. [Google Scholar]
Rahmede, C.; Iacovacci, J.; Arenas, A.; Bianconi, G. Centralities of nodes and influences of layers in large multiplex networks. J. Complex Netw. 2018, 6, 733–752. [Google Scholar] [CrossRef]
Interdonato, R.; Magnani, M.; Perna, D.; Tagarelli, A.; Vega, D. Multilayer network simplification: Approaches, models and methods. Comput. Sci. Rev. 2020, 36, 100246. [Google Scholar] [CrossRef]
Yablochnikov, S.; Kuptsov, M.; Mahiboroda, M. Modeling of Information Processes in Social Networks. Information 2021, 12, 116. [Google Scholar] [CrossRef]
Mikkilineni, R. A New Class of Autopoietic and Cognitive Machines. Information 2022, 13, 24. [Google Scholar] [CrossRef]
Zhang, Z.; Wu, Z.; Shi, Z. An improved algorithm of TFIDF combined with Naive Bayes. In Proceedings of the 2022 7th International Conference on Multimedia and Image Processing, Xian, China, 6–8 January 2022; pp. 167–171. [Google Scholar]
Rui, Z.; Yutai, H. Research on Short Text Classification Based on Word2Vec Microblog. In Proceedings of the International Conference on Computer Science and Management Technology (ICCSMT), Shanghai, China, 20–22 November 2020; pp. 178–182. [Google Scholar]
Zixuan, C.; Shunli, G. Automatic Text Summarization for Public Health WeChat Official Accounts Platform Base on Improved TextRank. J. Environ. Public Health 2022, 2022, 1166989. [Google Scholar]
Wang, X.F. Complex networks: Topology, dynamics and synchronization. Int. J. Bifurc. Chaos 2002, 12, 885–916. [Google Scholar] [CrossRef]
Freeman, L.C. Centrality in social networks conceptual clarification. Soc. Netw. 1978, 1, 215–239. [Google Scholar] [CrossRef]
Bonacich, P. Factoring and weighting approaches to status scores and clique identification. J. Math. Sociol. 1972, 2, 113–120. [Google Scholar] [CrossRef]

Figure 1. Deep neural network.

Figure 2. Structure of Mul-LSTM.

Figure 3. TFIDF centrality (partial).

Figure 4. TextRank layer centrality (partial).

Figure 5. BKMN multilayer network.

Figure 6. Four trends of centrality change.

Figure 7. Four measures of centrality.

Figure 8. Loss plot.

Table 1. Partial keyword nodes of a smartphone brand.

Label	Weight
Phone	626
Screen	458
Speed	444
Very	440
Camera	417
Performance	391
Design	390
Quality	375
Like	340
Good	323

Table 2. Partial keyword edges of a smartphone brand.

Source	Target	Weight
Camera	Quality	505
Performance	Speed	477
Screen	Camera	468
Screen	Speed	446
Camera	Speed	442
Screen	Quality	441
Camera	Performance	427
Screen	Performance	422
Design	Speed	417
Quality	Speed	416

Table 3. Description of embedding layer parameters.

Adjustable Parameter	Value
input_dim	200
output_dim	20
Input_length	200
Mask_zero	True

Table 4. Mul-LSTM model parameters.

Adjustable Parameter	Parameter Value
characteristic number	200
Epoch	20
batch_size	64
Lstm_cell	64
Dropout	0.5

Table 5. Top 100 keywords.

Algorithm

Top 100 Keywords

TF-IDF

Phone, Speed, Screen, Photography, Operation, Appearance, Effect, Sound Effects, Design, Like, A Certain, Compact, Standby Time, Feel, JD.com, No, Attractive, Usage, Feeling, Feature, Suitable, System, Satisfied, Price, Up, Issue, Logistics, Single-hand Operation, Color, Battery, Worth, Purchase, Received, Event, Charging, Single-hand, Quality, Genuine, Packaging, Buy, Operation, Blue, Experience, Enough, White, Very Beautiful, Express, Sound Quality, Design, Size, Video, Shopping, Gaming, Resolution, Classic, Recommend, Cute, Function, Performance, Choice, Not, Top-notch, Awesome, Display, Girls, Hope, Pixel, Memory, Looks, A Bit, Green, Border, Discount, Order, Standby, Trustworthy, Good to Use, Got, Time, Think, When, Cost-effective, Battery Life, Shipment, Bargain, Worthwhile, Habit, Backup Phone, Basic, Black, Positive Review, Camera, Don’t Need, Sound, Device, Also, Service, Bought, Processor, Body

TextRank

Screen, Phone, Photography, Sound Effects, Appearance, Standby Time, Speed, Very, Operation, Feel, Effect, A Certain, Compact, Design, Nice, Like, JD.com, Smooth, Mini, Attractive, Single-hand Operation, Fast, 12 Mini, 12, Clear, Convenient, Really, Feeling, Small Screen, Battery Life, Feature, Can, Suitable, Elegant, Usage, Satisfied, Comfortable, Size, Still, Special, Other, Day, Battery, Charging, System, Logistics, Great, Suitable, Single-hand, No, Genuine, This, Buy, Price, Always, Size, Color, Express, Enough, Received, Compared, Perfect, Super, Before, Blue, Up, Is, Mini, 618, Compact, Very Beautiful, Packaging, No Complaints, Sound Quality, Cost-effective, Experience, Purchase, Charge, This Model, Gaming, A14, Backup Phone, Pixel, Consistent, Order, Good to Use, Worth, Resolution, Awesome, Looks, Issue, Quality, Many, White, Event, Memory, Shopping, Indeed, A Bit

Table 6. Topological characteristics of the BKMN network.

Layer	Number of Nodes	Number of Edges	Network Diameter	Average Degree	Average Clustering Coefficient	Average Path Length
TDIDF Layer	100	3915	2	79.898	0.865	1.176
TextRank Layer	100	4134	2	87.957	0.953	1.054
Overlap Layer	60	1616	2	54.847	0.953	1.054
BKMN	140	8049	2	91.220	0.848	1.627

Table 7. Apple brand emotional knowledge classification indicator effectiveness table.

	Precision	Recall	F1-Score	Support
0	0.94	0.91	0.93	2995
1	0.92	0.95	0.94	3364
accuracy			0.93	6359
macro avg	0.93	0.93	0.93	6359
weighted avg	0.93	0.93	0.93	6359

Table 8. Comparison of macro averages for sentiment classification under different algorithms.

	Accuracy	Recall	F1-Score
SVM	0.8542	0.8587	0.8542
KNN	0.5087	0.5393	0.4475
Adaboost	0.9094	0.9098	0.9090
Xgboost	0.9177	0.9180	0.9173
Decision Tree	0.8215	0.8294	0.8213
Random Forest	0.8316	0.8390	0.8315
Naive Bayes	0.7469	0.7331	0.7329
LSTM	0.4594	0.5000	0.3148
Mul-LSTM	0.9308	0.9295	0.9304

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, P.; Zang, R.; Wang, Z.; Sun, Z. Multilayer Network Modeling for Brand Knowledge Discovery: Integrating TF-IDF and TextRank in Heterogeneous Semantic Space. Information 2025, 16, 614. https://doi.org/10.3390/info16070614

AMA Style

Xu P, Zang R, Wang Z, Sun Z. Multilayer Network Modeling for Brand Knowledge Discovery: Integrating TF-IDF and TextRank in Heterogeneous Semantic Space. Information. 2025; 16(7):614. https://doi.org/10.3390/info16070614

Chicago/Turabian Style

Xu, Peng, Rixu Zang, Zongshui Wang, and Zhuo Sun. 2025. "Multilayer Network Modeling for Brand Knowledge Discovery: Integrating TF-IDF and TextRank in Heterogeneous Semantic Space" Information 16, no. 7: 614. https://doi.org/10.3390/info16070614

APA Style

Xu, P., Zang, R., Wang, Z., & Sun, Z. (2025). Multilayer Network Modeling for Brand Knowledge Discovery: Integrating TF-IDF and TextRank in Heterogeneous Semantic Space. Information, 16(7), 614. https://doi.org/10.3390/info16070614

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multilayer Network Modeling for Brand Knowledge Discovery: Integrating TF-IDF and TextRank in Heterogeneous Semantic Space

Abstract

1. Introduction

2. Literature Review

2.1. Brand Knowledge Discovery

2.2. Multilayer Networks

3. Methodology

3.1. Data Sources and Hierarchy

3.2. Multilayer Network Construction

3.2.1. Text Preprocessing

3.2.2. Keyword Extraction

3.2.3. Centrality Analysis

3.2.4. Multilevel Network

3.3. Hierarchical Classification of Brand Product Reviews

3.4. Sentiment Analysis

3.4.1. Evaluation Indicators

3.4.2. Categorization and Analysis of Users’ Emotional Knowledge

3.4.3. Embedding Layer Parameters

3.4.4. Mul-LSTM Model Parameters

4. Results

4.1. Nodal Identification of Brand Product Reviews

4.2. Multilayer Network Analysis of BKMN

4.3. Brand Sentiment Analysis Based on Multilayer Networks

4.3.1. Mul-LSTM Result Analysis

4.3.2. Comparison of Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI