AdaptiveNet: A Novel Architecture for Reducing Computation Complexity to Fake Review Classification
Abstract
1. Introduction
2. Related Work
3. Methodology
3.1. Multi-Scale Semantic Fusion (MSSF) Layer Architecture
3.2. Dynamic Attention Scaling (DAS) Mechanism with Complexity Assessment
3.2.1. Dynamic Attention Head Allocation
3.2.2. Sparse Attention Pattern
| Algorithm 1: DAS—Complexity Assessment and Dynamic Head Allocation |
| Input: Review token sequence T = {t1, …, t_n}; pre-computed IDF table; embedding matrix E ∈ R^(V × D) Output: Complexity score C ∈ [0,1]; head count h ∈ {2,3,4,5,6}; sparse attention mask M ∈ {0,1}^(n × k) // Step 1: Compute sub-scores D_vocab ← |Set(T)|/n sentences ← SentenceSplit(T); m ← |sentences| lengths ← [len(s) for s in sentences] V_length ← clip(std(lengths)/mean(lengths), 0, 1) depths ← [MaxDepth(DependencyParse(s)) for s in sentences] S_syntax ← mean(depths)/d_max ▷ d_max = 15 // Step 2: Composite complexity C ← 0.4·D_vocab + 0.3·V_length + 0.3·S_syntax // Step 3: Head allocation (piecewise) if C < 0.3: h ← 2 else if C < 0.5: h ← 3 else if C < 0.7: h ← 4 else if C < 0.85: h ← 5 else: h ← 6 // Step 4: Dimension and sparse mask d_model ← 32·h; d_head ← 32 k ← min(n, ceil(8·log2(n))) I_j ← ||E[t_j]||2·(1 + IDF(t_j)) for j = 1…n top_k_indices ← argsort(I, descending)[:k] M ← sparse_mask(n, top_k_indices) ▷ M ∈ {0,1}^(n × k) return C, h, d_model, M |
3.3. Adaptive Parameter Sharing (APS) Network Flow
| Algorithm 2: Adaptive Parameter Sharing (APS) Network |
| Input: X ∈ R^(B × L × D) ▷ Feature tensor from DAS (B = batch, L = seq_len, D = 256) W_shared = {W_shared^(j)}_{j = 1}^{4} ▷ Shared pool: R^(256 × 128), R^(128 × 64), R^(64 × 32), R^(32 × 16) W_base = {W_base^(i)}_{i = 1}^{4} ▷ Base weights: same dims as W_shared^(i) w_g = {w_g^(i) ∈ R^D}_{i = 1}^{4} ▷ Gate projection vectors b_g = {b_g^(i) ∈ R}_{i = 1}^{4} ▷ Gate bias scalars b = {b_i}_{i = 1}^{4} ▷ Layer biases: R^128, R^64, R^32, R^16 T = 0.5 ▷ Temperature for gate sharpness Output: Y_final ∈ R^(B × 16) ▷ Classification-ready representation G = {g_i}_{i = 1}^{4} ▷ Gate activations for interpretability // Step 1: Context vector extraction via global average pooling + L2 norm c ← (1/L) Σ_{l = 1}^{L} X[:,l,:] ▷ c ∈ R^(B × D) [mean over sequence dim] c ← c/||c||2 ▷ c ∈ R^(B × D), ||c_b||2 = 1 ∀ b // Step 2: Scalar gate computation (4 gates, one per layer) for i = 1 to 4 do: z_i ← (w_g^(i))T·c + b_g^(i) ▷ z_i ∈ R^B [linear projection] g_i ← σ(z_i/T) ▷ g_i ∈ (0,1)^B [temperature-scaled sigmoid] end for // Step 3: Dynamic weight generation (gated interpolation) for i = 1 to 4 do: W_dynamic^(i) ← g_i·W_shared^(i) + (1 − g_i)·W_base^(i) ▷ W_dynamic^(1) ∈ R^(256 × 128), W_dynamic^(2) ∈ R^(128 × 64) ▷ W_dynamic^(3) ∈ R^(64 × 32), W_dynamic^(4) ∈ R^(32 × 16) end for // Step 4: Forward propagation with progressive dimension reduction H1 ← ReLU(X·W_dynamic^(1) + b1) ▷ H1 ∈ R^(B × L × 128) H1_pool ← (1/L) Σ_{l = 1}^{L} H1[:,l,:] ▷ H1_pool ∈ R^(B × 128) [sequence pooling] H2 ← ReLU(H1_pool·W_dynamic^(2) + b2) ▷ H2 ∈ R^(B × 64) H3 ← ReLU(H2·W_dynamic^(3) + b3) ▷ H3 ∈ R^(B × 32) Y_final ← H3·W_dynamic^(4) + b4 ▷ Y_final ∈ R^(B × 16) [no activation] return Y_final, G = {g1, g2, g3, g4} |
3.3.1. Shared Parameter Pool
3.3.2. Context-Aware Gate Mechanism
3.3.3. Dynamic Weight Generation
3.3.4. Forward Propagation Through APS Layers
4. Implementation
4.1. Dataset Preparation and Preprocessing
4.1.1. Fake Review Labelling Methodology
4.1.2. Label Verification Process
4.2. Model Configuration and Training Parameters
4.3. Baseline Model Implementation
4.4. Benchmarking Protocol
4.4.1. Inference Benchmarking Protocol
4.4.2. Energy Measurement Methodology
- (i)
- Before each experiment, the GPU was reset to idle state using nvidia-smi --gpu-reset and the power draw was verified to return to idle baseline (approximately 55 W for A100).
- (ii)
- A dedicated background thread sampled instantaneous power draw via nvmlDeviceGetPowerUsage() at 100 ms intervals, storing timestamped power values (in milliwatts) in a circular buffer.
- (iii)
- GPU energy (in joules) was computed by numerical integration (trapezoidal rule) of the power–time series: EGPU = Σ_{k = 1}^{N − 1} [(Pk + P{k+1})/2] × Δt, where Pk is the power sample at time k and Δt = 0.1 s.
- (iv)
- Idle power consumption was subtracted to report only computation-attributable energy: EGPUnet = EGPU_total − Pidle × Ttotal.
5. Results
5.1. Computational Efficiency Analysis
5.2. Component-Wise Ablation Study
5.3. Sensitivity Analysis of DAS Complexity Weights
5.3.1. Grid Search Methodology
5.3.2. Sensitivity Analysis Discussion
5.3.3. Learnable-Weight Alternative
5.4. Cross-Domain Generalization Analysis
5.4.1. Experimental Design
- (i)
- Amazon → Yelp: Train on 500 K Amazon product reviews, test on 300 K Yelp restaurant reviews
- (ii)
- Amazon → TripAdvisor: Train on 500 K Amazon product reviews, test on 200 K TripAdvisor hotel reviews
- (iii)
- Yelp → Amazon: Uses 300 K Yelp restaurant reviews for train set, 500 K Amazon product reviews for test set.
- (iv)
- Yelp → TripAdvisor: 300 K Yelp restaurant reviews for training, 200 K TripAdvisor hotel reviews for testing
- (v)
- TripAdvisor → Amazon: Train 200 K TripAdvisor hotel reviews, test on 500 K Amazon product reviews
- (vi)
- TripAdvisor → Yelp: 200 K TripAdvisor hotel reviews for training and 300 K Yelp restaurant reviews for testing.
5.4.2. Cross-Domain Transfer Results
5.4.3. Analysis of Cross-Domain Results
5.4.4. Domain-Invariant Feature Analysis
5.4.5. Comparing with Domain Adaptation Baselines
- (a)
- BERT + Domain-Adversarial Training (DANN): A gradient reversal layer was attached to BERT-base to guide the model to generate domain-invariant representations given labelled source and unlabelled target datasets during training. Average cross-domain accuracy: 87.2 ± 0.5%, which is 1.6 percentage points over vanilla BERT-base (85.6%) but trailed behind AdaptiveNet (89.3%) by 2.1 percentage points
- (b)
- BERT + Multi-Domain Pretraining: BERT-base was first pretrained (MLM objective) on unlabelled reviews from the target domain before training it using labelled data from the source domain. Cross-domain accuracy (mean ± confidence interval): 88.1 ± 0.4%, reducing the gap with AdaptiveNet to only 1.2 percentage points, but at the expense of requiring unlabelled target-domain data and an additional pretraining overhead (about another hr on A100 per target domain).
5.5. Efficiency–Performance Trade-Off Analysis
6. Comparative Analysis
6.1. Performance Comparison with State-of-the-Art Models
6.2. Computational Efficiency Comparison
6.3. Scalability and Real-World Deployment Analysis
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| MSSF | Multi-Scale Semantic Fusion |
| DAS | Dynamic Attention Scaling |
| APS | Adaptive Parameter Sharing |
| Bi-LSTM | Bidirectional Long Short-Term Memory |
| CNN | Convolutional Neural Network |
| NLP | Natural Language Processing |
| RNN | Recurrent Neural Network |
| GPU | Graphics Processing Unit |
| LSTM | Long Short-Term Memory |
| BERT | Bidirectional Encoder Representations from Transformers |
| HTML | Hypertext Markup Language |
| BiLSTM | Bidirectional Long Short-Term Memory |
| AUC-ROC | Area Under the Receiver Operating Characteristic Curve |
| RoBERTa | Robustly Optimized BERT Pretraining Approach |
References
- Hoo, W.C.; Cheng, A.Y.; Ng, A.H.H.; Bakar, S.M.B.S.A. Factors influencing consumer behaviour towards online purchase intention on popular shopping platforms in Malaysia. WSEAS Trans. Bus. Econ. 2024, 21, 544–553. [Google Scholar] [CrossRef]
- Al-Tai, M.; Nema, B.; Al-Sherbaz, A. Deep learning for fake news detection: Literature review. Al-Mustansiriyah J. Sci. 2023, 34, 70–81. [Google Scholar] [CrossRef]
- Saini, P.; Khatarkar, V. Machine learning techniques for identifying fake news: An overview. Smart Moves J. Ijoscience 2023, 9, 1–5. [Google Scholar] [CrossRef]
- Sangeetha, S.; Sangeetha, B.; Kumar, R.; Shevannth, R.; Krishna Prasath, S.; Mohammed Rafi, M. Fake review detection using deep learning. In Artificial Intelligence and Communication Technologies; SCRS: Delhi, India, 2023; pp. 655–668. [Google Scholar] [CrossRef]
- Zaki, N.; Krishnan, A.; Turaev, S.; Rustamov, Z.; Rustamov, J.; Almusalamiet, A. Node embedding approach for accurate detection of fake reviews: A graph-based machine learning approach with explainable AI. Research Square 2023. [Google Scholar] [CrossRef]
- Polpolage, S. Fake review detection in yelp restaurant reviews via natural language processing. Research Square 2025. [Google Scholar] [CrossRef]
- Li, F.; Huang, M.; Yang, Y.; Zhu, X. Learning to identify review spam. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011; pp. 2488–2493. [Google Scholar]
- Mukherjee, A.; Venkataraman, V.; Liu, B.; Glance, N. What yelp fake review filter might be doing? In Proceedings of the International AAAI Conference on Web and Social Media, Boston, MA, USA, 8–11 July 2013; pp. 409–418. [Google Scholar]
- Rayana, S.; Akoglu, L. Collective opinion spam detection: Bridging review networks and metadata. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 985–994. [Google Scholar]
- Kim, Y. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 1746–1751. [Google Scholar]
- Zhang, X.; Zhao, J.; LeCun, Y. Character-level convolutional networks for text classification. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28, pp. 649–657. [Google Scholar]
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Siuda, P.; Behnke, M.; Hedlund, D. Detecting fake reviews: Just a matter of data. In Proceedings of the 56th Hawaii International Conference on System Sciences, Maui, HI, USA, 3–6 January 2023. [Google Scholar] [CrossRef]
- Hu, B.; Mao, Z.; Zhang, Y. An overview of fake news detection: From a new perspective. Fundam. Res. 2025, 5, 332–346. [Google Scholar] [CrossRef]
- Zaki, N.; Krishnan, A.; Turaev, S.; Rustamov, Z.; Rustamov, J.; Almusalamiet, A. Node embedding approach for accurate detection of fake reviews: A graph-based machine learning approach with explainable AI. Int. J. Data Sci. Anal. 2024, 18, 295–315. [Google Scholar] [CrossRef]
- Sun, P.; Bi, W.; Zhang, Y.; Wang, Q.; Kou, F.; Luet, T. Fake review detection model based on comment content and review behavior. Electronics 2024, 13, 4322. [Google Scholar] [CrossRef]
- Kalbhor, S.; Goyal, D.; Sankhla, K. Taming misinformation: Fake review detection on social media platform using hybrid ensemble technique. Int. J. Electr. Electron. Res. 2024, 12, 27–33. [Google Scholar] [CrossRef]
- Ren, Y.; Zhang, J.; Wang, H.; Li, X. Tensor factorization with sparse and graph regularization for fake news detection on social networks. IEEE Trans. Comput. Soc. Syst. 2024, 11, 3144–3155. [Google Scholar] [CrossRef]
- Zhang, X.; Guo, F.; Chen, T.; Pan, L.; Beliakov, G.; Wu, J. A brief survey of machine learning and deep learning techniques for e-commerce research. J. Theor. Appl. Electron. Commer. Res. 2023, 18, 2188–2216. [Google Scholar] [CrossRef]
- Dao, T. FlashAttention-2: Faster attention with better parallelism and work partitioning. In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Ainslie, J.; Lee-Thorp, J.; de Jong, M.; Zemlyanskiy, Y.; Lebrón, F.; Sanghai, S. GQA: Training generalized multi-query transformer models from multi-head checkpoints. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, 6–10 December 2023; pp. 4895–4901. [Google Scholar]
- Jayasinghe, J.; Dassanayaka, S. Detecting deception: Employing deep neural networks for fraudulent review detection on amazon. Research Square 2024. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. In Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC2), Vancouver, BC, Canada, 13 December 2019. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Wang, S.; Li, B.Z.; Khabsa, M.; Fang, H.; Ma, H. Linformer: Self-attention with linear complexity. arXiv 2020, arXiv:2006.04768. [Google Scholar] [CrossRef]
- Choromanski, K.; Likhosherstov, V.; Dohan, D.; Song, X.; Gane, A.; Sarlos, T.; Hawkins, P.; Davis, J.; Mohiuddin, A.; Kaiser, L.; et al. Rethinking attention with performers. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 1135–1143. [Google Scholar]
- Jacob, Y.; Dupont, E.; Tuytelaars, T. Deep quantization: Encoding convolutional activations with deep generative model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5456–5465. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114. [Google Scholar]
- Jiao, X.; Yin, L.; Shang, L.; Jiang, X.; Chen, X.; Li, L.; Wang, F.; Liu, Q. TinyBERT: Distilling BERT for natural language understanding. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Virtual Event, 16–20 November 2020; pp. 4163–4174. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A robustly optimized BERT pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Sun, Z.; Yu, H.; Song, X.; Liu, R.; Yang, Y.; Zhou, D. MobileBERT: A compact task-agnostic BERT for resource-limited devices. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), Virtual Event, 5–10 July 2020; pp. 2158–2170. [Google Scholar]
- He, P.; Gao, J.; Chen, W. DeBERTaV3: Improving DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Wang, Z.; Yao, A.; Xu, G.; Ren, M. A large language model-based approach for fake review detection: The implicit characteristics perspective. Inf. Process. Manag. 2026, 63, 104352. [Google Scholar] [CrossRef]
- Xu, C.; McAuley, J. A survey on model compression and acceleration for pretrained language models. Proc. AAAI Conf. Artif. Intell. 2024, 38, 19439–19447. [Google Scholar] [CrossRef]
- McAuley, J.; Leskovec, J. Amazon Product Data. Available online: https://www.kaggle.com/datasets/snap/amazon-fine-food-reviews (accessed on 15 December 2024).
- He, R.; McAuley, J. Amazon Review Data (2018). Available online: https://nijianmo.github.io/amazon/index.html (accessed on 15 December 2024).
- Yelp Inc. Yelp Open Dataset. Available online: https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset (accessed on 15 December 2024).
- Anderson, M. Yelp Restaurant Reviews Dataset. Available online: https://www.kaggle.com/datasets/omkarsabnis/yelp-reviews-dataset (accessed on 15 December 2024).
- Mishra, A. TripAdvisor Hotel Reviews. Available online: https://www.kaggle.com/datasets/andrewmvd/trip-advisor-hotel-reviews (accessed on 15 December 2024).
- DataFiniti Inc. Hotel Reviews Data. Available online: https://www.kaggle.com/datasets/datafiniti/hotel-reviews (accessed on 15 December 2024).
- Jindal, N.; Liu, B. Opinion spam and analysis. In Proceedings of the 2008 International Conference on Web Search and Data Mining, Stanford, CA, USA, 11–12 February 2008; pp. 219–230. [Google Scholar]
- Fakespot Inc. Fakespot Analyzer. Available online: https://www.fakespot.com (accessed on 10 January 2025).
- Luca, M.; Zervas, G. Fake it till you make it: Reputation, competition, and Yelp review fraud. Manag. Sci. 2016, 62, 3412–3427. [Google Scholar] [CrossRef]
- Ott, M.; Choi, Y.; Cardie, C.; Hancock, J.T. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, OR, USA, 19–24 June 2011; pp. 309–319. [Google Scholar]
- Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
- Newman, M.L.; Pennebaker, J.W.; Berry, D.S.; Richards, J.M. Lying words: Predicting deception from linguistic styles. Personal. Soc. Psychol. Bull. 2003, 29, 665–675. [Google Scholar] [CrossRef]









| GPU | NVIDIA A100 SXM4 (40 GB HBM2e, 6912 CUDA cores, 432 Tensor cores, 1.41 GHz boost clock, 1555 GB/s memory bandwidth) |
| CPU | Intel Xeon Gold 6248R (3.0 GHz base/4.0 GHz turbo, 24 cores/48 threads, 35.75 MB L3 cache) |
| System Memory | 128 GB DDR4-3200 ECC (4 × 32 GB, quad-channel) |
| Storage | 1 TB Samsung 980 PRO NVMe SSD (sequential read: 7000 MB/s) |
| Interconnect | PCIe Gen4 x16 (GPU-CPU), NVLink 3.0 (600 GB/s bisection) |
| Power Supply | 2000 W, 80 PLUS Platinum (for stable power delivery during measurement) |
| Cooling | Liquid cooling, ambient temperature maintained at 22 ± 1 °C throughout experiments |
| Operating System | Ubuntu 20.04.6 LTS (kernel 5.4.0-150-generic) |
| CUDA Toolkit | CUDA 11.8 (driver 520.61.05) |
| cuDNN | cuDNN 8.6.0 for CUDA 11.x |
| Python | Python 3.10.12 (CPython) |
| PyTorch | PyTorch 2.0.1 + cu118 (with torch.compile disabled for fair comparison) |
| Hugging Face Transformers | Transformers 4.30.2 (for baseline model loading and fine-tuning) |
| Tokenizers | Tokenizers 0.13.3 (Hugging Face fast tokenizers) |
| NVML Interface | pynvml 11.5.0 (for GPU power measurement) |
| RAPL Interface | pyRAPL 0.2.3.1 (for CPU energy measurement) |
| spaCy | spaCy 3.5.3 with en_core_web_sm (for DAS syntactic parsing) |
| NumPy/SciPy | NumPy 1.24.3, SciPy 1.10.1 |
| Random Seeds | torch.manual_seed(s), numpy.random.seed(s), s ∈ {42, 123, 256, 512, 1024} |
| Model | Amazon (%) | Yelp (%) | TripAdvisor (%) | Average (%) |
|---|---|---|---|---|
| AdaptiveNet | 95.2 | 94.6 | 94.5 | 94.8 |
| BERT-base [25] | 92.4 | 91.8 | 92.1 | 92.1 |
| RoBERTa [26] | 92.1 | 91.2 | 92.1 | 91.8 |
| DistilBERT [27] | 90.8 | 89.9 | 90.5 | 90.4 |
| MobileBERT [28] | 88.7 | 87.9 | 88.2 | 88.3 |
| TinyBERT [36] | 86.5 | 85.8 | 86.1 | 86.1 |
| CNN [10] | 85.2 | 84.6 | 84.9 | 84.9 |
| BiLSTM [11] | 83.9 | 83.1 | 83.5 | 83.5 |
| Model | Memory (MB) | Training Time (min) | Inference (ms/Sample) | Params (M) | Energy Train (kWh) | Energy Infer (mJ/Sample) | FLOPs (G) | Precision |
|---|---|---|---|---|---|---|---|---|
| AdaptiveNet | 187 ± 3 | 45 ± 2.1 | 12.0 ± 0.3 | 2.1 | 0.82 ± 0.04 | 14.2 ± 0.6 | 20.5 | FP32 |
| AdaptiveNet (FP16) | 112 ± 2 | 31 ± 1.8 | 7.4 ± 0.2 | 2.1 | 0.56 ± 0.03 | 8.8 ± 0.4 | 20.5 | FP16 |
| BERT-base [25] | 1340 ± 8 | 385 ± 11.2 | 51.0 ± 1.4 | 110 | 7.24 ± 0.31 | 62.8 ± 2.1 | 21,800 | FP32 |
| RoBERTa [26] | 1285 ± 10 | 412 ± 14.6 | 34.0 ± 1.1 | 125 | 7.86 ± 0.38 | 42.4 ± 1.8 | 24,500 | FP32 |
| DistilBERT [27] | 745 ± 5 | 198 ± 7.3 | 28.0 ± 0.8 | 66 | 3.61 ± 0.18 | 34.2 ± 1.3 | 11,300 | FP32 |
| MobileBERT [28] | 425 ± 4 | 156 ± 5.8 | 22.0 ± 0.6 | 25 | 2.78 ± 0.14 | 26.8 ± 1.0 | 5700 | FP32 |
| TinyBERT [36] | 298 ± 3 | 89 ± 3.4 | 18.0 ± 0.5 | 14.5 | 1.52 ± 0.08 | 21.4 ± 0.8 | 3200 | FP32 |
| CNN [10] | 234 ± 2 | 67 ± 2.8 | 15.0 ± 0.4 | 3.2 | 1.14 ± 0.06 | 17.8 ± 0.7 | 410 | FP32 |
| BiLSTM [11] | 412 ± 4 | 98 ± 3.9 | 25.0 ± 0.7 | 5.8 | 1.76 ± 0.09 | 30.2 ± 1.1 | 890 | FP32 |
| Configuration | Accuracy (%) | Memory (MB) | Parameters (M) | Inference (ms) |
|---|---|---|---|---|
| Complete AdaptiveNet | 94.8 | 187 | 2.1 | 12 |
| Without MSSF | 92.5 | 165 | 1.8 | 10 |
| Without DAS | 93.0 | 252 | 2.1 | 16 |
| Without APS | 93.3 | 187 | 4.6 | 12 |
| Without MSSF + DAS | 91.0 | 234 | 1.8 | 14 |
| Without DAS + APS | 91.7 | 298 | 4.6 | 18 |
| Without MSSF + APS | 90.8 | 165 | 4.1 | 10 |
| CNN Baseline | 84.9 | 234 | 3.2 | 15 |
| Sl. No | α | β | γ | Val Accuracy (%) | Avg Heads | FLOPs (G) | Infer (ms) |
|---|---|---|---|---|---|---|---|
| 1 | 0.1 | 0.1 | 0.8 | 93.21 ± 0.18 | 3.8 | 23.1 | 13.4 |
| 2 | 0.1 | 0.2 | 0.7 | 93.38 ± 0.15 | 3.7 | 22.8 | 13.2 |
| 3 | 0.1 | 0.3 | 0.6 | 93.52 ± 0.14 | 3.6 | 22.4 | 13.0 |
| 4 | 0.1 | 0.4 | 0.5 | 93.41 ± 0.16 | 3.5 | 22.1 | 12.8 |
| 5 | 0.1 | 0.5 | 0.4 | 93.29 ± 0.19 | 3.4 | 21.8 | 12.6 |
| 6 | 0.2 | 0.2 | 0.6 | 93.68 ± 0.13 | 3.5 | 22.0 | 12.9 |
| 7 | 0.2 | 0.3 | 0.5 | 93.84 ± 0.12 | 3.4 | 21.6 | 12.7 |
| 8 | 0.2 | 0.4 | 0.4 | 93.71 ± 0.14 | 3.3 | 21.3 | 12.5 |
| 9 | 0.2 | 0.5 | 0.3 | 93.55 ± 0.17 | 3.2 | 21.0 | 12.3 |
| 10 | 0.3 | 0.2 | 0.5 | 94.12 ± 0.11 | 3.3 | 21.2 | 12.5 |
| 11 | 0.3 | 0.3 | 0.4 | 94.36 ± 0.09 | 3.2 | 20.9 | 12.3 |
| 12 | 0.3 | 0.4 | 0.3 | 94.28 ± 0.10 | 3.1 | 20.6 | 12.1 |
| 13 | 0.4 | 0.2 | 0.4 | 94.51 ± 0.08 | 3.2 | 20.8 | 12.2 |
| 14 | 0.4 | 0.3 | 0.3 | 94.78 ± 0.07 | 3.1 | 20.5 | 12.0 |
| 15 | 0.5 | 0.2 | 0.3 | 94.62 ± 0.09 | 3.0 | 20.2 | 11.9 |
| Configuration | Converged Weights (α, β, γ) | Val Accuracy (%) | Test Accuracy (%) |
|---|---|---|---|
| Fixed (grid-searched) | 0.400, 0.300, 0.300 | 94.78 ± 0.07 | 94.81 ± 0.09 |
| Learnable (seed 42) | 0.382, 0.312, 0.306 | 94.71 ± 0.08 | 94.74 ± 0.10 |
| Learnable (seed 123) | 0.391, 0.298, 0.311 | 94.75 ± 0.07 | 94.78 ± 0.09 |
| Learnable (seed 256) | 0.377, 0.318, 0.305 | 94.68 ± 0.09 | 94.72 ± 0.11 |
| Learnable (mean ± std) | 0.383 ± 0.007, 0.309 ± 0.010, 0.307 ± 0.003 | 94.71 ± 0.04 | 94.75 ± 0.03 |
| Model | Am→Ye | Am→TA | Ye→Am | Ye→TA | TA→Am | TA→Ye | Average |
|---|---|---|---|---|---|---|---|
| AdaptiveNet | 90.4 ± 0.4 | 89.8 ± 0.5 | 88.7 ± 0.5 | 90.1 ± 0.4 | 87.5 ± 0.6 | 89.2 ± 0.5 | 89.3 ± 0.3 |
| BERT-base [25] | 86.8 ± 0.6 | 86.2 ± 0.7 | 84.9 ± 0.7 | 86.5 ± 0.6 | 83.4 ± 0.8 | 85.6 ± 0.7 | 85.6 ± 0.4 |
| RoBERTa [42] | 86.1 ± 0.7 | 85.5 ± 0.8 | 84.2 ± 0.8 | 85.8 ± 0.7 | 82.8 ± 0.9 | 84.9 ± 0.8 | 84.9 ± 0.5 |
| DistilBERT [26] | 84.2 ± 0.8 | 83.8 ± 0.9 | 82.5 ± 0.9 | 84.0 ± 0.8 | 81.2 ± 1.0 | 83.1 ± 0.9 | 83.1 ± 0.6 |
| MobileBERT [43] | 82.1 ± 0.9 | 81.6 ± 1.0 | 80.4 ± 1.0 | 81.8 ± 0.9 | 79.1 ± 1.1 | 81.0 ± 1.0 | 81.0 ± 0.7 |
| TinyBERT [36] | 79.8 ± 1.1 | 79.2 ± 1.2 | 78.1 ± 1.2 | 79.5 ± 1.1 | 76.8 ± 1.3 | 78.6 ± 1.2 | 78.7 ± 0.8 |
| CNN [10] | 77.5 ± 1.2 | 76.9 ± 1.3 | 76.2 ± 1.3 | 77.1 ± 1.2 | 74.8 ± 1.4 | 76.4 ± 1.3 | 76.5 ± 0.9 |
| BiLSTM [11] | 75.8 ± 1.3 | 75.2 ± 1.4 | 74.5 ± 1.4 | 75.4 ± 1.3 | 73.1 ± 1.5 | 74.7 ± 1.4 | 74.8 ± 1.0 |
| Model | In-Domain Avg (%) | Cross-Domain Avg (%) | Drop (pp) | Relative Drop (%) | Transfer Ratio |
|---|---|---|---|---|---|
| AdaptiveNet | 94.8 | 89.3 | 5.5 | 5.8% | 0.942 |
| BERT-base [25] | 92.1 | 85.6 | 6.5 | 7.1% | 0.929 |
| RoBERTa [42] | 91.8 | 84.9 | 6.9 | 7.5% | 0.925 |
| DistilBERT [26] | 90.4 | 83.1 | 7.3 | 8.1% | 0.919 |
| MobileBERT [43] | 88.3 | 81.0 | 7.3 | 8.3% | 0.917 |
| TinyBERT [36] | 86.1 | 78.7 | 7.4 | 8.6% | 0.914 |
| CNN [10] | 84.9 | 76.5 | 8.4 | 9.9% | 0.901 |
| BiLSTM [11] | 83.5 | 74.8 | 8.7 | 10.4% | 0.896 |
| Model | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | AUC-ROC |
|---|---|---|---|---|---|
| AdaptiveNet | 94.8 | 95.1 | 94.5 | 94.8 | 0.972 |
| BERT-base [25] | 92.1 | 92.8 | 91.6 | 92.2 | 0.945 |
| RoBERTa [26] | 91.8 | 93.2 | 90.4 | 91.8 | 0.938 |
| DistilBERT [27] | 90.4 | 91.1 | 89.8 | 90.4 | 0.925 |
| MobileBERT [28] | 88.3 | 89.2 | 87.5 | 88.3 | 0.908 |
| TinyBERT [36] | 86.1 | 87.3 | 84.9 | 86.1 | 0.892 |
| CNN [10] | 84.9 | 85.7 | 84.1 | 84.9 | 0.878 |
| BiLSTM [11] | 83.5 | 84.2 | 82.8 | 83.5 | 0.865 |
| Model | GPU Throughput (Reviews/s) | CPU Throughput (Reviews/s) | Latency (ms) | Memory Growth Rate |
|---|---|---|---|---|
| AdaptiveNet | 2340 | 156 | 4.2 | Sub-linear |
| BERT-base [25] | 485 | 23 | 18.7 | Quadratic |
| RoBERTa [26] | 412 | 19 | 21.4 | Quadratic |
| DistilBERT [27] | 856 | 45 | 12.3 | Linear |
| MobileBERT [28] | 1245 | 78 | 8.9 | Linear |
| TinyBERT [36] | 1567 | 98 | 6.8 | Sub-linear |
| CNN [10] | 1892 | 134 | 5.1 | Constant |
| BiLSTM [11] | 967 | 67 | 9.8 | Linear |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Perumalsamy, D.; Cornelius, S.R.P.; Thinakaran, R. AdaptiveNet: A Novel Architecture for Reducing Computation Complexity to Fake Review Classification. Information 2026, 17, 388. https://doi.org/10.3390/info17040388
Perumalsamy D, Cornelius SRP, Thinakaran R. AdaptiveNet: A Novel Architecture for Reducing Computation Complexity to Fake Review Classification. Information. 2026; 17(4):388. https://doi.org/10.3390/info17040388
Chicago/Turabian StylePerumalsamy, Deepalakshmi, Sharon Roji Priya Cornelius, and Rajermani Thinakaran. 2026. "AdaptiveNet: A Novel Architecture for Reducing Computation Complexity to Fake Review Classification" Information 17, no. 4: 388. https://doi.org/10.3390/info17040388
APA StylePerumalsamy, D., Cornelius, S. R. P., & Thinakaran, R. (2026). AdaptiveNet: A Novel Architecture for Reducing Computation Complexity to Fake Review Classification. Information, 17(4), 388. https://doi.org/10.3390/info17040388

