From UGC to Brand Product Improvement: Mining Consumer Innovation Insights Across Social Media Platforms
Abstract
1. Introduction
2. Literature Review
2.1. Mining Consumer Insights
2.2. Mining Innovation Insights
2.3. LLM Augmentation and Siamese Network
3. Research Methodology
3.1. Data Collection
3.1.1. Research Context
3.1.2. Data Acquisition
3.1.3. Data Preprocessing
3.2. Constructing the Ground Truth
3.2.1. Concept Definition and Annotation Process
3.2.2. Sparsity and Heterogeneity in Data Distribution
3.3. Model Development
3.3.1. Methodological Spectrum: Baseline Models
3.3.2. The Proposed Framework: LLM-Augmented Siamese Framework
3.3.3. Phase I: Active Semantic Augmentation via Cognitive Proxies
3.3.4. Phase II: Semantic Consistency Learning via Siamese Network
3.3.5. Optimization Strategy
3.4. From Insight Detection to Strategic Analysis
3.4.1. Semantic Clustering and Taxonomy Generation
3.4.2. Value-Urgency Prioritization Mechanism
4. Empirical Analysis and Results
4.1. Evaluation Metrics and Experimental Design
4.1.1. Evaluation Metrics
4.1.2. Experimental Design Scenarios
4.1.3. Model Configuration and Implementation
4.2. Analysis on the Aggregated Dataset
4.3. Single Platform Analysis
4.4. Leave One Platform out Analysis
5. Additional Analysis
5.1. Uncovering Latent Innovation Themes
5.2. Prioritizing Innovation: The Value-Urgency Matrix
6. Discussion
6.1. Academic Implications
6.2. Practical Implications
7. Limitations and Future Research
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Additional Results
| Model | Douyin | Xiaohongshu | Bilibili | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F1 | PR-AUC | Precision | Recall | F1 | PR-AUC | Precision | Recall | F1 | PR-AUC | Precision | Recall | F1 | PR-AUC | Precision | Recall | ||||
| Model 1 | 0.316 | 0.167 | 0.333 | 0.300 | 0.278 | 0.212 | 0.556 | 0.185 | 0.485 | 0.604 | 0.446 | 0.532 | 0.468 | 0.321 | 0.379 | 0.611 | |||
| Model 2 | 0.286 | 0.183 | 0.273 | 0.300 | 0.356 | 0.381 | 0.444 | 0.296 | 0.719 | 0.740 | 0.762 | 0.681 | 0.323 | 0.270 | 0.385 | 0.278 | |||
| Model 3 | 0.667 | 0.793 | 0.636 | 0.700 | 0.667 | 0.719 | 0.667 | 0.667 | 0.857 | 0.931 | 0.886 | 0.830 | 0.714 | 0.880 | 1.000 | 0.556 | |||
| Model 4 | 0.636 | 0.760 | 0.583 | 0.700 | 0.600 | 0.652 | 0.652 | 0.556 | 0.842 | 0.932 | 0.833 | 0.851 | 0.757 | 0.827 | 0.737 | 0.778 | |||
| Model 5 | 0.667 | 0.812 | 0.636 | 0.700 | 0.679 | 0.749 | 0.692 | 0.667 | 0.882 | 0.942 | 0.891 | 0.872 | 0.765 | 0.816 | 0.812 | 0.722 | |||
| Model | Douyin | Xiaohongshu | Bilibili | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F1 | PR-AUC | Precision | Recall | F1 | PR-AUC | Precision | Recall | F1 | PR-AUC | Precision | Recall | F1 | PR-AUC | Precision | Recall | ||||
| Model 1 | 0.385 | 0.367 | 0.375 | 0.396 | 0.404 | 0.309 | 0.348 | 0.483 | 0.479 | 0.551 | 0.676 | 0.371 | 0.372 | 0.354 | 0.265 | 0.624 | |||
| Model 2 | 0.469 | 0.432 | 0.535 | 0.418 | 0.529 | 0.545 | 0.614 | 0.466 | 0.660 | 0.705 | 0.780 | 0.573 | 0.447 | 0.433 | 0.483 | 0.416 | |||
| Model 3 | 0.817 | 0.876 | 0.885 | 0.758 | 0.816 | 0.890 | 0.850 | 0.784 | 0.909 | 0.957 | 0.937 | 0.882 | 0.827 | 0.896 | 0.804 | 0.851 | |||
| Model 4 | 0.810 | 0.858 | 0.883 | 0.747 | 0.820 | 0.890 | 0.881 | 0.767 | 0.801 | 0.953 | 0.981 | 0.677 | 0.758 | 0.902 | 0.655 | 0.901 | |||
| Model 5 | 0.810 | 0.881 | 0.883 | 0.747 | 0.819 | 0.893 | 0.915 | 0.741 | 0.892 | 0.964 | 0.935 | 0.852 | 0.762 | 0.896 | 0.659 | 0.901 | |||
References
- Bayus, B.L. Crowdsourcing new product ideas over time: An analysis of the dell ideastorm community. Manag. Sci. 2013, 59, 226–244. [Google Scholar] [CrossRef]
- Nambisan, S.; Lyytinen, K.; Majchrzak, A.; Song, M. Digital innovation management. MIS Q. 2017, 41, 223–238. [Google Scholar] [CrossRef]
- Randhawa, K.; Wilden, R.; Hohberger, J. A bibliometric review of open innovation: Setting a research agenda. J. Prod. Innov. Manag. 2016, 33, 750–772. [Google Scholar] [CrossRef]
- Nasrabadi, M.A.; Beauregard, Y.; Ekhlassi, A. The implication of user-generated content in new product development process: A systematic literature review and future research agenda. Technol. Forecast. Soc. Change 2024, 206, 123551. [Google Scholar] [CrossRef]
- Vargo, S.L.; Lusch, R.F. Institutions and axioms: An extension and update of service-dominant logic. J. Acad. Mark. Sci. 2016, 44, 5–23. [Google Scholar] [CrossRef]
- Zheng, X.; Cheung, C.M.; Lee, M.K.; Liang, L. Building brand loyalty through user engagement in online brand communities in social networking sites. Inf. Technol. People 2015, 28, 90–106. [Google Scholar] [CrossRef]
- Kilumile, J.W.; Zuo, L. The nexus of influencers and purchase intention: Does consumer brand co-creation behavior matter? J. Theor. Appl. Electron. Commer. Res. 2024, 19, 3088–3101. [Google Scholar] [CrossRef]
- Zhang, C.; Xu, Z. Gaining insights for service improvement through unstructured text from online reviews. J. Retail. Consum. Serv. 2024, 80, 103898. [Google Scholar] [CrossRef]
- Chen, H.; Chiang, R.H.; Storey, V.C. Business intelligence and analytics: From big data to big impact. Mis Q. 2012, 36, 1165–1188. [Google Scholar] [CrossRef]
- Erevelles, S.; Fukawa, N.; Swayne, L. Big data consumer analytics and the transformation of marketing. J. Bus. Res. 2016, 69, 897–904. [Google Scholar] [CrossRef]
- Von Hippel, E. “Sticky information” and the locus of problem solving: Implications for innovation. Manag. Sci. 1994, 40, 429–439. [Google Scholar] [CrossRef]
- Timoshenko, A.; Hauser, J.R. Identifying customer needs from user-generated content. Mark. Sci. 2019, 38, 1–20. [Google Scholar] [CrossRef]
- Wang, L.; Che, G.; Hu, J.; Chen, L. Online review helpfulness and information overload: The roles of text, image, and video elements. J. Theor. Appl. Electron. Commer. Res. 2024, 19, 1243–1266. [Google Scholar] [CrossRef]
- Büschken, J.; Allenby, G.M. Sentence-based text analysis for customer reviews. Mark. Sci. 2016, 35, 953–975. [Google Scholar] [CrossRef]
- Abrahams, A.S.; Fan, W.; Wang, G.A.; Zhang, Z.; Jiao, J. An integrated text analytic framework for product defect discovery. Prod. Oper. Manag. 2015, 24, 975–990. [Google Scholar] [CrossRef]
- Mustak, M.; Hallikainen, H.; Laukkanen, T.; Plé, L.; Hollebeek, L.D.; Aleem, M. Using machine learning to develop customer insights from user-generated content. J. Retail. Consum. Serv. 2024, 81, 104034. [Google Scholar] [CrossRef]
- Shen, Z.; Zhao, C.; Li, Y. Customer requirements analysis and product service improvement framework using multi-source user-generated content and dual importance–performance analysis: A case study of fresh e-ecommerce. J. Theor. Appl. Electron. Commer. Res. 2026, 21, 19. [Google Scholar] [CrossRef]
- Cui, X.; Zhu, Z.; Liu, L.; Zhou, Q.; Liu, Q. Anomaly detection in consumer review analytics for idea generation in product innovation: Comparing machine learning and deep learning techniques. Technovation 2024, 134, 103028. [Google Scholar] [CrossRef]
- Zhang, M.; Fan, B.; Zhang, N.; Wang, W.; Fan, W. Mining product innovation ideas from online reviews. Inf. Process. Manag. 2021, 58, 102389. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv 2019, arXiv:1908.10084. [Google Scholar] [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Dai, H.; Liu, Z.; Liao, W.; Huang, X.; Cao, Y.; Wu, Z.; Zhao, L.; Xu, S.; Zeng, F.; Liu, W.; et al. Auggpt: Leveraging chatgpt for text data augmentation. IEEE Trans. Big Data 2025, 11, 907–918. [Google Scholar] [CrossRef]
- Ding, B.; Qin, C.; Zhao, R.; Luo, T.; Li, X.; Chen, G.; Xia, W.; Hu, J.; Luu, A.T.; Joty, S. Data augmentation using large language models: Data perspectives, learning paradigms and challenges. arXiv 2024, arXiv:2403.02990. [Google Scholar]
- Gao, T.; Yao, X.; Chen, D. Simcse: Simple contrastive learning of sentence embeddings. arXiv 2021, arXiv:2104.08821. [Google Scholar]
- Li, Q.; Peng, H.; Li, J.; Xia, C.; Yang, R.; Sun, L.; Yu, P.S.; He, L. A survey on text classification: From traditional to deep learning. ACM Trans. Intell. Syst. Technol. (TIST) 2022, 13, 31. [Google Scholar] [CrossRef]
- Berger, J.; Humphreys, A.; Ludwig, S.; Moe, W.W.; Netzer, O.; Schweidel, D.A. Uniting the tribes: Using text for marketing insight. J. Mark. 2020, 84, 1–25. [Google Scholar] [CrossRef]
- Zhu, Q.; Wang, Y.; Xu, X.; Sarkis, J. How loud is consumer voice in product deletion decisions? retail analytic insights. J. Retail. Consum. Serv. 2025, 82, 104110. [Google Scholar] [CrossRef]
- Archak, N.; Ghose, A.; Ipeirotis, P.G. Deriving the pricing power of product features by mining consumer reviews. Manag. Sci. 2011, 57, 1485–1509. [Google Scholar] [CrossRef]
- Tirunillai, S.; Tellis, G.J. Mining marketing meaning from online chatter: Strategic brand analysis of big data using latent dirichlet allocation. J. Mark. Res. 2014, 51, 463–479. [Google Scholar] [CrossRef]
- Maalej, W.; Nabil, H. Bug report, feature request, or simply praise? On automatically classifying app reviews. In 2015 IEEE 23rd International Requirements Engineering Conference (RE); IEEE: Piscataway, NJ, USA, 2015; pp. 116–125. [Google Scholar]
- Govindarajan, V.S.; Chen, B.; Warholic, R.; Erk, K.; Li, J.J. Help! Need advice on identifying advice. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 8–12 November 2020; pp. 5295–5306. [Google Scholar]
- Kühl, N.; Mühlthaler, M.; Goutier, M. Supporting customer-oriented marketing with artificial intelligence: Automatically quantifying customer needs from social media. Electron. Mark. 2020, 30, 351–367. [Google Scholar] [CrossRef]
- Di Marco, N.; Loru, E.; Bonetti, A.; Serra, A.O.G.; Cinelli, M.; Quattrociocchi, W. Patterns of linguistic simplification on social media platforms over time. Proc. Natl. Acad. Sci. USA 2024, 121, e2412105121. [Google Scholar] [CrossRef] [PubMed]
- Feng, S.Y.; Gangal, V.; Wei, J.; Chandar, S.; Vosoughi, S.; Mitamura, T.; Hovy, E. A survey of data augmentation approaches for nlp. arXiv 2021, arXiv:2105.03075. [Google Scholar] [CrossRef]
- Wei, J.; Zou, K. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv 2019, arXiv:1901.11196. [Google Scholar] [CrossRef]
- Liu, Q.; Du, Q.; Hong, Y.; Fan, W.; Wu, S. User idea implementation in open innovation communities: Evidence from a new product development crowdsourcing community. Inf. Syst. J. 2020, 30, 899–927. [Google Scholar] [CrossRef]
- Zhang, S.; Pan, S.L.; Ouyang, T.H. Building social translucence in a crowdsourcing process: A case study of miui.com. Inf. Manag. 2020, 57, 103172. [Google Scholar] [CrossRef]
- Uysal, A.K.; Gunal, S. The impact of preprocessing on text classification. Inf. Process. Manag. 2014, 50, 104–112. [Google Scholar] [CrossRef]
- Fleiss, J.L. Measuring nominal scale agreement among many raters. Psychol. Bull. 1971, 76, 378. [Google Scholar] [CrossRef]
- Lin, Y.C.; Chen, S.A.; Liu, J.J.; Lin, C.J. Linear classifier: An often-forgotten baseline for text classification. arXiv 2023, arXiv:2306.07111. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar] [CrossRef]
- Cui, Y.; Che, W.; Liu, T.; Qin, B.; Yang, Z. Pre-training with whole word masking for Chinese bert. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 3504–3514. [Google Scholar] [CrossRef]
- Bouschery, S.G.; Blazevic, V.; Piller, F.T. Augmenting human innovation teams with artificial intelligence: Exploring transformer-based language models. J. Prod. Innov. Manag. 2023, 40, 139–153. [Google Scholar] [CrossRef]
- Liu, A.; Feng, B.; Xue, B.; Wang, B.; Wu, B.; Lu, C.; Zhao, C.; Deng, C.; Zhang, C.; Ruan, C.; et al. Deepseek-v3 technical report. arXiv 2024, arXiv:2412.19437. [Google Scholar]
- Li, B.; Hou, Y.; Che, W. Data augmentation approaches in natural language processing: A survey. Ai Open 2022, 3, 71–90. [Google Scholar] [CrossRef]
- Tenney, I.; Das, D.; Pavlick, E. Bert rediscovers the classical nlp pipeline. arXiv 2019, arXiv:1905.05950. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Izmailov, P.; Podoprikhin, D.; Garipov, T.; Vetrov, D.; Wilson, A.G. Averaging weights leads to wider optima and better generalization. arXiv 2018, arXiv:1803.05407. [Google Scholar]
- Cohen, M.A.; Eliasberg, J.; Ho, T.H. New product development: The performance and time-to-market tradeoff. Manag. Sci. 1996, 42, 173–186. [Google Scholar] [CrossRef]
- Si, H.; Kavadias, S.; Loch, C. Managing innovation portfolios: From project selection to portfolio design. Prod. Oper. Manag. 2022, 31, 4572–4588. [Google Scholar] [CrossRef]
- Svensson, R.B.; Torkar, R. Not all requirements prioritization criteria are equal at all times: A quantitative analysis. J. Syst. Softw. 2024, 209, 111909. [Google Scholar] [CrossRef]
- Nagji, B.; Tuff, G. Managing your innovation portfolio. Harv. Bus. Rev. 2012, 90, 66–74. [Google Scholar]
- Valverde-Albacete, F.J.; Peláez-Moreno, C. 100% classification accuracy considered harmful: The normalized information transfer factor explains the accuracy paradox. PLoS ONE 2014, 9, e84217. [Google Scholar] [CrossRef] [PubMed]
- Davis, J.; Goadrich, M. The relationship between precision-recall and roc curves. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar]
- Wang, J.; Lan, C.; Liu, C.; Ouyang, Y.; Qin, T.; Lu, W.; Chen, Y.; Zeng, W.; Yu, P.S. Generalizing to unseen domains: A survey on domain generalization. IEEE Trans. Knowl. Data Eng. 2022, 35, 8052–8072. [Google Scholar] [CrossRef]
- Lloyd, S. Least squares quantization in pcm. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
- Maaten, L.v.d.; Hinton, G. Visualizing data using t-sne. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Yin, S.; Fu, C.; Zhao, S.; Li, K.; Sun, X.; Xu, T.; Chen, E. A survey on multimodal large language models. Natl. Sci. Rev. 2024, 11, nwae403. [Google Scholar] [CrossRef]
- Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; March, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar]








| Platform | Interaction Focus | Content Modality |
|---|---|---|
| Xiaohongshu | Experience sharing | Image–text note |
| Public discourse | Microblogging | |
| Bilibili | Interest interaction | Long-form video |
| Douyin | Instant expression | Short video |
| Platform | Raw Count | Valid Count | Avg. Length |
|---|---|---|---|
| Xiaohongshu | 55,999 | 31,290 | 24.01 |
| 31,767 | 27,501 | 23.42 | |
| Bilibili | 52,228 | 36,270 | 31.70 |
| Douyin | 70,971 | 38,477 | 22.53 |
| Total | 210,965 | 133,538 | 25.55 |
| Original Review (x) | Augmented Variant 1 () | Augmented Variant 2 () |
|---|---|---|
| As a hearing impaired user, […] the experience is terrible. The haptics setting is not vibration for us. […] The normal function should support message classification. […] I hope the developer sees this. | Suggestion to overhaul the haptic feedback system by implementing specific vibration patterns for distinct notification categories (IM apps) and introducing granular intensity modulation to ensure accessibility. | An ideal solution would be an accessibility option that lets users adjust vibration duration, frequency, and strength levels, so tactile alerts are easier to feel for people who are hearing-impaired. |
| Xiao Ai can’t distinguish the owner’s voice from others, when I open it while watching TV, she proactively identifies the TV sound and talks nonsense […] I didn’t see any option in the settings. | The current voice activation often picks up background noise and misses my commands. It would be really helpful to add “Owner Voiceprint Isolation” to filter out sources like TV audio. | Xiao Ai doesn’t handle background noise well. I suggest adding a strict “Owner-Only” mode that verifies the owner’s voice, so it won’t be triggered by TV, videos, or other media nearby. |
| When can you release an automation scheme where if the watch judges that a person is asleep and the phone screen is off, it automatically turns off the lights? | Inquiry about a more complex automation setup: triggering an automatic lights off routine when my wearable detects I’m asleep and my phone screen is locked. | I’d like a smoother ecosystem feature where the system controls the lights by combining my watch’s sleep status with my phone’s screen lock status. |
| Model Configuration | Precision | Recall | F1 Score | PR-AUC |
|---|---|---|---|---|
| Baselines | ||||
| Model 1 | 0.5347 | 0.5294 | 0.5320 | 0.5607 |
| Model 2 | 0.6762 | 0.6961 | 0.6860 | 0.6909 |
| Model 3 | 0.8602 | 0.7843 | 0.8205 | 0.8958 |
| Proposed Framework | ||||
| Model 4 | 0.9000 | 0.7941 | 0.8437 | 0.8980 |
| Model 5 | 0.8491 | 0.8824 | 0.8654 | 0.9291 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, J.; Wu, Q. From UGC to Brand Product Improvement: Mining Consumer Innovation Insights Across Social Media Platforms. J. Theor. Appl. Electron. Commer. Res. 2026, 21, 64. https://doi.org/10.3390/jtaer21020064
Wang J, Wu Q. From UGC to Brand Product Improvement: Mining Consumer Innovation Insights Across Social Media Platforms. Journal of Theoretical and Applied Electronic Commerce Research. 2026; 21(2):64. https://doi.org/10.3390/jtaer21020064
Chicago/Turabian StyleWang, Jiacheng, and Qiang Wu. 2026. "From UGC to Brand Product Improvement: Mining Consumer Innovation Insights Across Social Media Platforms" Journal of Theoretical and Applied Electronic Commerce Research 21, no. 2: 64. https://doi.org/10.3390/jtaer21020064
APA StyleWang, J., & Wu, Q. (2026). From UGC to Brand Product Improvement: Mining Consumer Innovation Insights Across Social Media Platforms. Journal of Theoretical and Applied Electronic Commerce Research, 21(2), 64. https://doi.org/10.3390/jtaer21020064
