1. Introduction
User reviews on Yelp influence consumer decisions, yet naive star averaging ignores textual nuance, temporal drift, and manipulation risk. An auditable solution should (1) integrate textual sentiment with stars; (2) reduce the impact of stale or suspicious evidence; (3) expose explanations; and (4) be reproducible and policy-compliant.
Contributions:
Scrape2Repute provides (i) an end-to-end pipeline with minimal dependencies; (ii) a hybrid per-review label combining normalised stars with a calibrated text score; (iii) a lightweight anomaly screen; (iv) time-decayed aggregation with confidence intervals; and (v) explainability via
n-gram/token attributions and aspect summaries. We release CLI/GUI entry points and an ethics and compliance checklist tailored to data. An overview of the end-to-end architecture is shown in
Figure 1.
2. Related Work
Work on online reputation blends ratings, text, and interactions. Collaborative filtering and trust modelling improve robustness and mitigate bias across contexts [
1,
2]. Reputation dynamics, such as inertia, motivate temporal and adaptive treatment [
3]. For web services, QoS-aware models and matrix factorisation fuse performance indicators with reputation signals [
4,
5]; integrity is improved by graph-based filtering of dishonest raters and credibility modelling [
6,
7]. Personalised reputation adapts scores to user history [
8]. Our pipeline operationalises these ideas end-to-end on Yelp: calibrated text+stars fusion, unsupervised anomaly dampening, and time-decayed aggregation with uncertainty.
3. Methodology
Our pipeline comprises five stages: Acquisition (Yelp-compliant loading), Preprocessing (normalisation, dedup), Modelling (text classifier + calibration; hybrid fusion), Screening (unsupervised anomalies), and Aggregation (time decay with uncertainty).
3.1. Data Access and Normalisation
We use the
Open Dataset [
9], focusing on
review and
business. For each review we keep
(review_id, business_id, user_id, stars, date, text). Stars are normalised as
, with timestamps
. We remove empty/near-empty texts, strip HTML, lowercase, normalise whitespace/emoji, optionally filter to English, and deduplicate by hashing normalised text.
3.2. Text Modelling and Calibration
Let denote a vectorised review (TF–IDF). A logistic classifier produces a calibrated probability (Platt/isotonic). Sentence embeddings with a non-linear classifier can replace TF–IDF where resources permit.
3.3. Hybrid, Anomaly, and Aggregation
An Isolation Forest estimates soft anomaly
from length, repetition, burstiness, and (where legal) account cues [
10]. The robustness weight
attenuates suspicious reviews. We select
via cross-validation or stability criteria.
4. Experiments and Results
4.1. Setup
We train a TF-IDF + Logistic classifier with weak supervision from extreme stars (normalised positive, negative) and Platt calibration (3-fold; 50% calibration split). Inference runs in chunks on the cleaned review file, followed by Bayesian reputation aggregation and anomaly estimation.
Data Scale and Footprint: We processed M reviews; the 90-day split retained 7059 businesses with and . Training (TF-IDF + LR + Platt) took ∼88 min; the sensitivity sweep ∼30 min; anomaly scoring ∼15 min; and other stages min each on a commodity workstation. These figures suggest that the pipeline is practical for routine, offline recalibration at scale.
4.2. Text Model Quality
On a held-out sample (
n = 7846), accuracy is 0.875; per-class metrics are in
Table 1. The positive class shows strong precision/recall; neutral remains challenging under weak supervision.
4.3. Predictive Validity (Early → Future)
We test whether early hybrid scores forecast future stars at the business level (90-day horizon;
,
; 7059 businesses; see
Figure 2 and
Figure 3a). Weighted mean future star01 = 0.7239; Pearson
r = 0.7553 (weighted
); Spearman
= 0.6737 (weighted
); and top-decile mean = 0.9069 (
vs. overall).
4.4. Distribution and Sensitivity
Figure 3 summarises predictive validity and the distribution of Bayesian reputation (mode ∼70); see panels (a, b).
We sweep half-life
, suspicious-weight
, and
;
Table 2 contrasts the baseline (HL = 365, SW = 0.3,
) with best settings and—practically—recommends HL 540–720, SW
–
, and
.
Baseline contrast: A stars-only aggregator is included as a baseline; the hybrid shows better decile calibration (
Figure 2) and higher predictive validity across the sweep (
Table 2).
Reproducibility: Code, configs, fixed seeds, and artefacts enable end-to-end replication.
Practical impact: The pipeline supports periodic offline recalibration and lightweight deployment for marketplace ranking, vendor monitoring, and early-warning alerts.
5. Ethics, Compliance
We use the Yelp Open Dataset under its research terms [
9]; we avoid PII, apply rate limiting, and honour robots and policies.
6. Conclusions
Scrape2Repute offers an auditable pathway from Yelp reviews to a time-aware, anomaly-robust reputation with explanations. Predictive validity is strong (Pearson at 90-day horizon) and stable across decay/anomaly settings. Future work includes multilingual aspect extraction, category-specific calibration, and user studies on explanation usefulness.
Author Contributions
Conceptualisation, E.S.T., R.I.M.V., A.L.S.O. and L.J.G.V.; methodology, E.S.T., A.L.S.O. and L.J.G.V.; validation, E.S.T., A.L.S.O. and L.J.G.V.; investigation, E.S.T., R.I.M.V., A.L.S.O. and L.J.G.V. All authors have read and agreed to the published version of the manuscript.
Funding
This research is supported by the PRIVATEER EU project, Grant agreement N 101096110, by the Programme UNICO-5G I+D of the Spanish Ministerio de Asuntos Económicos y Transformación Digital, the European Union—NextGeneration EU in the framework of the ”Plan de Recuperacion, Transformación y Resiliencia” under reference “TRAZA5G (TSI-063000-2021-0050)”, by the Recovery, Transformation and Resilience Plan, financed by the European Union (Next Generation EU), through the Chair “Cybersecurity for Innovation and Digital Protection” INCIBE-UCM and by Comunidad Autonoma de Madrid, CIRMA-CM Project (TEC-2024/COM-404). The content of this article does not reflect the official opinion of the European Union. Responsibility for the information and views expressed therein lies entirely with the authors.
Data Availability Statement
The dataset analyzed in this study is Yelp Open Datasets [
9].
Conflicts of Interest
The authors declare no conflict of interest.
References
- Moreno, N.; Pérez-Vereda, A.; Vallecillo, A. Managing reputation in collaborative social computing applications. J. Object Technol. 2022, 21, 3. [Google Scholar] [CrossRef]
- Muslim, H.S.M.; Rubab, S.; Khan, M.M.; Iltaf, N.; Bashir, A.K.; Javed, K. S-RAP: Relevance-aware QoS prediction in web-services and user contexts. Knowl. Inf. Syst. 2022, 64, 1997–2022. [Google Scholar] [CrossRef]
- Duradoni, M.; Gronchi, G.; Bocchi, L.; Guazzini, A. Reputation matters the most: The reputation inertia effect. Hum. Behav. Emerg. Technol. 2020, 2, 71–81. [Google Scholar] [CrossRef]
- Xu, J.; Chen, Y.; Zhu, C. A QoS-based User Reputation Measurement Method for Web Services. In Proceedings of the 2018 International Conference on Computer Science, Electronics and Communication Engineering (CSECE 2018), Wuhan, China, 7–8 February 2018; Advances in Computer Science Research Series; Atlantis Press: Paris, France, 2018; pp. 470–473. [Google Scholar] [CrossRef]
- Ghafouri, S.H.; Hashemi, S.M.; Razzazi, M.R.; Movaghar, A. Web service quality of service prediction via regional reputation-based matrix factorization. Concurr. Comput. Pract. Exp. 2021, 33, e6318. [Google Scholar] [CrossRef]
- Tibermacine, O.; Tibermacine, C.; Kerdoudi, M.L. Reputation evaluation with malicious feedback prevention using a HITS-based model. In Proceedings of the 2019 IEEE International Conference on Web Services (ICWS), Milan, Italy, 8–13 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 180–187. [Google Scholar]
- Paul, A.; Dhar, S.; Roy, S. Applying interrater reliability measure for user credibility assessment in reputation-oriented service discovery. In Proceedings of the Web Intelligence, Venice, Italy, 26–29 October 2023; SAGE Publications Sage: London, UK, 2023; Volume 21, pp. 167–180. [Google Scholar]
- Du, X.; Xu, J.; Cai, W.; Zhu, C.; Chen, Y. Oprc: An online personalized reputation calculation model in service-oriented computing environments. IEEE Access 2019, 7, 87760–87768. [Google Scholar] [CrossRef]
- Yelp Open Dataset. Available online: https://www.yelp.com/dataset (accessed on 3 February 2026).
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining(ICDM), Washington, DC, USA, 15–19 December 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 413–422. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |