Journal Description
Analytics
                    Analytics 
                    is an international, peer-reviewed, open access journal on methodologies, technologies, and applications of analytics, published quarterly online by MDPI.
                - Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus and other databases.
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 27.4 days after submission; acceptance to publication is undertaken in 7.7 days (median values for papers published in this journal in the first half of 2025).
- Recognition of Reviewers: APC discount vouchers, optional signed peer review, and reviewer names published annually in the journal.
- Analytics is a companion journal of Mathematics.
- Journal Cluster of Information Systems and Technology: Analytics, Applied System Innovation, Cryptography, Data, Digital, Informatics, Information, Journal of Cybersecurity and Privacy and Multimedia.
Latest Articles
        
        
                    
    
        
    
    System Inertia Cost Forecasting Using Machine Learning: A Data-Driven Approach for Grid Energy Trading in Great Britain
                        
    
                
            
                
        Analytics 2025, 4(4), 30; https://doi.org/10.3390/analytics4040030 - 23 Oct 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
    
            As modern power systems integrate more renewable and decentralised generation, maintaining grid stability has become increasingly challenging. This study proposes a data-driven machine learning framework for forecasting system inertia service costs—a key yet underexplored variable influencing energy trading and frequency stability in Great
             [...] Read more.
        
        
            As modern power systems integrate more renewable and decentralised generation, maintaining grid stability has become increasingly challenging. This study proposes a data-driven machine learning framework for forecasting system inertia service costs—a key yet underexplored variable influencing energy trading and frequency stability in Great Britain. Using eight years (2017–2024) of National Energy System Operator (NESO) data, four models—Long Short-Term Memory (LSTM), Residual LSTM, eXtreme Gradient Boosting (XGBoost), and Light Gradient-Boosting Machine (LightGBM)—are comparatively analysed. LSTM-based models capture temporal dependencies, while ensemble methods effectively handle nonlinear feature relationships. Results demonstrate that LightGBM achieves the highest predictive accuracy, offering a robust method for inertia cost estimation and market intelligence. The framework contributes to strategic procurement planning and supports market design for a more resilient, cost-effective grid.
            Full article
        
    
        
        
                    (This article belongs to the  Special Issue Business Analytics and Applications)
            
        
        
►
             Show Figures
         Open AccessArticle
    
    Distributional CNN-LSTM, KDE, and Copula Approaches for Multimodal Multivariate Data: Assessing Conditional Treatment Effects
                        
            by
                    Jong-Min Kim        
    
                
        
        Analytics 2025, 4(4), 29; https://doi.org/10.3390/analytics4040029 - 21 Oct 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            We introduce a distributional CNN-LSTM framework for probabilistic multivariate modeling and heterogeneous treatment effect (HTE) estimation. The model jointly captures complex dependencies among multiple outcomes and enables precise estimation of individual-level conditional average treatment effects (CATEs). In simulation studies with multivariate Gaussian mixtures,
             [...] Read more.
        
        
            We introduce a distributional CNN-LSTM framework for probabilistic multivariate modeling and heterogeneous treatment effect (HTE) estimation. The model jointly captures complex dependencies among multiple outcomes and enables precise estimation of individual-level conditional average treatment effects (CATEs). In simulation studies with multivariate Gaussian mixtures, the CNN-LSTM demonstrates robust density estimation and strong CATE recovery, particularly as mixture complexity increases, while classical methods such as Kernel Density Estimation (KDE) and Gaussian Copulas may achieve higher log-likelihood or coverage in simpler scenarios. On real-world datasets, including Iris and Criteo Uplift, the CNN-LSTM achieves the lowest CATE RMSE, confirming its practical utility for individualized prediction, although KDE and Gaussian Copula approaches may perform better on global likelihood or coverage metrics. These results indicate that the CNN-LSTM can be trained efficiently on moderate-sized datasets while maintaining stable predictive performance. Overall, the framework is particularly valuable in applications requiring accurate individual-level effect estimation and handling of multimodal heterogeneity—such as personalized medicine, economic policy evaluation, and environmental risk assessment—with its primary strength being superior CATE recovery under complex outcome distributions, even when likelihood-based metrics favor simpler baselines.
            Full article
        
    
Figure 1
Open AccessArticle
    
    Reservoir Computation with Networks of Differentiating Neuron Ring Oscillators
                        
            by
                    Alexander Yeung, Peter DelMastro, Arjun Karuvally, Hava Siegelmann, Edward Rietman and Hananel Hazan        
    
                
        
        Analytics 2025, 4(4), 28; https://doi.org/10.3390/analytics4040028 - 20 Oct 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            Reservoir computing is an approach to machine learning that leverages the dynamics of a complex system alongside a simple, often linear, machine learning model for a designated task. While many efforts have previously focused their attention on integrating neurons, which produce an output
             [...] Read more.
        
        
            Reservoir computing is an approach to machine learning that leverages the dynamics of a complex system alongside a simple, often linear, machine learning model for a designated task. While many efforts have previously focused their attention on integrating neurons, which produce an output in response to large, sustained inputs, we focus on using differentiating neurons, which produce an output in response to large changes in input. Here, we introduce a small-world graph built from rings of differentiating neurons as a Reservoir Computing substrate. We find the coupling strength and network topology that enable these small-world networks to function as an effective reservoir. The dynamics of differentiating neurons naturally give rise to oscillatory dynamics when arranged in rings, where we study their computational use in the Reservoir Computing setting. We demonstrate the efficacy of these networks in the MNIST digit recognition task, achieving comparable performance of 90.65% to existing Reservoir Computing approaches. Beyond accuracy, we conduct systematic analysis of our reservoir’s internal dynamics using three complementary complexity measures that quantify neuronal activity balance, input dependence, and effective dimensionality. Our analysis reveals that optimal performance emerges when the reservoir operates with intermediate levels of neural entropy and input sensitivity, consistent with the edge-of-chaos hypothesis, where the system balances stability and responsiveness. The findings suggest that differentiating neurons can be a potential alternative to integrating neurons and can provide a sustainable future alternative for power-hungry AI applications.
            Full article
        
    
Figure 1
Open AccessArticle
    
    Multiplicative Decomposition Model to Predict UK’s Long-Term Electricity Demand with Monthly and Hourly Resolution
                        
            by
                    Marie Baillon, María Carmen Romano and Ekkehard Ullner        
    
                
        
        Analytics 2025, 4(4), 27; https://doi.org/10.3390/analytics4040027 - 6 Oct 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            The UK electricity market is changing to adapt to Net Zero targets and respond to disruptions like the Russia–Ukraine war. This requires strategic planning to decide on the construction of new electricity generation plants for a resilient UK electricity grid. Such planning is
             [...] Read more.
        
        
            The UK electricity market is changing to adapt to Net Zero targets and respond to disruptions like the Russia–Ukraine war. This requires strategic planning to decide on the construction of new electricity generation plants for a resilient UK electricity grid. Such planning is based on forecasting the UK electricity demand long-term (from 1 year and beyond). In this paper, we propose a long-term predictive model by identifying the main components of the UK electricity demand, modelling each of these components, and combining them in a multiplicative manner to deliver a single long-term prediction. To the best of our knowledge, this study is the first to apply a multiplicative decomposition model for long-term predictions at both monthly and hourly resolutions, combining neural networks with Fourier analysis. This approach is extremely flexible and accurate, with a mean absolute percentage error of 4.16% and 8.62% in predicting the monthly and hourly electricity demand, respectively, from 2019 to 2021.
            Full article
        
    
Graphical abstract
Open AccessArticle
    
    Fairness in Predictive Marketing: Auditing and Mitigating Demographic Bias in Machine Learning for Customer Targeting
                        
            by
                    Sayee Phaneendhar Pasupuleti, Jagadeesh Kola, Sai Phaneendra Manikantesh Kodete and Sree Harsha Palli        
    
                
        
        Analytics 2025, 4(4), 26; https://doi.org/10.3390/analytics4040026 - 1 Oct 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
    
            As organizations increasingly turn to machine learning for customer segmentation and targeted marketing, concerns about fairness and algorithmic bias have become more urgent. This study presents a comprehensive fairness audit and mitigation framework for predictive marketing models using the Bank Marketing dataset. We
             [...] Read more.
        
        
            As organizations increasingly turn to machine learning for customer segmentation and targeted marketing, concerns about fairness and algorithmic bias have become more urgent. This study presents a comprehensive fairness audit and mitigation framework for predictive marketing models using the Bank Marketing dataset. We train logistic regression and random forest classifiers to predict customer subscription behavior and evaluate their performance across key demographic groups, including age, education, and job type. Using model explainability techniques such as SHAP and fairness metrics including disparate impact and true positive rate parity, we uncover notable disparities in model behavior that could result in discriminatory targeting. We implement three mitigation strategies—reweighing, threshold adjustment, and feature exclusion—and assess their effectiveness in improving fairness while preserving business-relevant performance metrics. Among these, reweighing produced the most balanced outcome, raising the Disparate Impact Ratio for older individuals from 0.65 to 0.82 and reducing the true positive rate parity gap by over 40%, with only a modest decline in precision (from 0.78 to 0.76). We propose a replicable workflow for embedding fairness auditing into enterprise BI systems and highlight the strategic importance of ethical AI practices in building accountable and inclusive marketing technologies. technologies.
            Full article
        
    
        
        
                    (This article belongs to the  Special Issue Business Analytics and Applications)
            
        
        
►▼
             Show Figures
         
Figure 1
Open AccessReview
    
    Evolution Cybercrime—Key Trends, Cybersecurity Threats, and Mitigation Strategies from Historical Data
                        
            by
                    Muhammad Abdullah, Muhammad Munib Nawaz, Bilal Saleem, Maila Zahra, Effa binte Ashfaq and Zia Muhammad        
    
                
        
        Analytics 2025, 4(3), 25; https://doi.org/10.3390/analytics4030025 - 18 Sep 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            The landscape of cybercrime has undergone significant transformations over the past decade. Present-day threats include AI-generated attacks, deep fakes, 5G network vulnerabilities, cryptojacking, and supply chain attacks, among others. To remain resilient against contemporary threats, it is essential to examine historical data to
             [...] Read more.
        
        
            The landscape of cybercrime has undergone significant transformations over the past decade. Present-day threats include AI-generated attacks, deep fakes, 5G network vulnerabilities, cryptojacking, and supply chain attacks, among others. To remain resilient against contemporary threats, it is essential to examine historical data to gain insights that can inform cybersecurity strategies, policy decisions, and public awareness campaigns. This paper presents a comprehensive analysis of the evolution of cyber trends in state-sponsored attacks over the past 20 years, based on the council on foreign relations state-sponsored cyber operations (2005–present). The study explores the key trends, patterns, and demographic shifts in cybercrime victims, the evolution of complaints and losses, and the most prevalent cyber threats over the years. It also investigates the geographical distribution, the gender disparity in victimization, the temporal peaks of specific scams, and the most frequently reported internet crimes. The findings reveal a traditional cyber landscape, with cyber threats becoming more sophisticated and monetized. Finally, the article proposes areas for further exploration through a comprehensive analysis. It provides a detailed chronicle of the trajectory of cybercrimes, offering insights into its past, present, and future.
            Full article
        
    
Figure 1
Open AccessArticle
    
    Meta-Analysis of Artificial Intelligence’s Influence on Competitive Dynamics for Small- and Medium-Sized Financial Institutions
                        
            by
                    Macy Cudmore and David Mattie        
    
                
        
        Analytics 2025, 4(3), 24; https://doi.org/10.3390/analytics4030024 - 18 Sep 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
    
            Artificial intelligence adoption in financial services presents uncertain implications for competitive dynamics, particularly for smaller institutions. The literature on AI in finance is growing, but there remains a notable absence regarding the impacts on small- and medium-sized financial services firms. We conduct a
             [...] Read more.
        
        
            Artificial intelligence adoption in financial services presents uncertain implications for competitive dynamics, particularly for smaller institutions. The literature on AI in finance is growing, but there remains a notable absence regarding the impacts on small- and medium-sized financial services firms. We conduct a meta-analysis combining a systematic literature review, sentiment bibliometrics, and network analysis to examine how AI is transforming competition across different firm sizes in the financial sector. Our analysis of 160 publications reveals predominantly positive academic sentiment toward AI in finance (mean positive sentiment 0.725 versus negative 0.586, Cohen’s d = 0.790, p < 0.0001), with anticipatory sentiment increasing significantly over time (
    
        
        
                    (This article belongs to the  Special Issue Business Analytics and Applications)
            
        
        
►▼
             Show Figures
         
Figure 1
Open AccessArticle
    
    Game-Theoretic Analysis of MEV Attacks and Mitigation Strategies in Decentralized Finance
                        
            by
                    Benjamin Appiah, Daniel Commey, Winful Bagyl-Bac, Laurene Adjei and Ebenezer Owusu        
    
                
        
        Analytics 2025, 4(3), 23; https://doi.org/10.3390/analytics4030023 - 15 Sep 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            Maximal Extractable Value (MEV) presents a significant challenge to the fairness and efficiency of decentralized finance (DeFi). This paper provides a game-theoretic analysis of the strategic interactions within the MEV supply chain, involving searchers, builders, and validators. A three-stage game of incomplete information
             [...] Read more.
        
        
            Maximal Extractable Value (MEV) presents a significant challenge to the fairness and efficiency of decentralized finance (DeFi). This paper provides a game-theoretic analysis of the strategic interactions within the MEV supply chain, involving searchers, builders, and validators. A three-stage game of incomplete information is developed to model these interactions. The analysis derives the Perfect Bayesian Nash Equilibria for primary MEV attack vectors, such as sandwich attacks, and formally characterizes attacker behavior. The research demonstrates that the competitive dynamics of the current MEV market are best described as Bertrand-style competition, which compels rational actors to engage in aggressive extraction that reduces overall system welfare in a prisoner’s dilemma-like outcome. To address these issues, the paper proposes and evaluates mechanism design solutions, including commit–reveal schemes and threshold encryption. The potential of these solutions to mitigate harmful MEV is quantified. Theoretical models are validated against on-chain data from the Ethereum blockchain, showing a close alignment between theoretical predictions and empirically observed market behavior.
            Full article
        
    
Figure 1
Open AccessArticle
    
    Bankruptcy Prediction Using Machine Learning and Data Preprocessing Techniques
                        
            by
                    Kamil Samara and Apurva Shinde        
    
                
        
        Analytics 2025, 4(3), 22; https://doi.org/10.3390/analytics4030022 - 10 Sep 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            Bankruptcy prediction is critical for financial risk management. This study demonstrates that machine learning models, particularly Random Forest, can substantially improve prediction accuracy compared to traditional approaches. Using data from 8262 U.S. firms (1999–2018), we evaluate Logistic Regression, SVM, Random Forest, ANN, and
             [...] Read more.
        
        
            Bankruptcy prediction is critical for financial risk management. This study demonstrates that machine learning models, particularly Random Forest, can substantially improve prediction accuracy compared to traditional approaches. Using data from 8262 U.S. firms (1999–2018), we evaluate Logistic Regression, SVM, Random Forest, ANN, and RNN in combination with robust data preprocessing steps. Random Forest achieved the highest prediction accuracy (~95%), far surpassing Logistic Regression (~57%). Key preprocessing steps included feature engineering of financial ratios, feature selection, class balancing using SMOTE, and scaling. The findings highlight that ensemble and deep learning models—particularly Random Forest and ANN—offer strong predictive performance, suggesting their suitability for early-warning financial distress systems.
            Full article
        
    
Figure 1
Open AccessArticle
    
    Accurate Analytical Forms of Heaviside and Ramp Function
                        
            by
                    John Constantine Venetis        
    
                
        
        Analytics 2025, 4(3), 21; https://doi.org/10.3390/analytics4030021 - 26 Aug 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
    
            In this paper, explicit exact representations of the Unit Step Function and Ramp Function are obtained. These important functions constitute fundamental concepts of operational calculus together with digital signal processing theory and are also involved in many other areas of applied sciences and
             [...] Read more.
        
        
            In this paper, explicit exact representations of the Unit Step Function and Ramp Function are obtained. These important functions constitute fundamental concepts of operational calculus together with digital signal processing theory and are also involved in many other areas of applied sciences and engineering practices. In particular, according to a rigorous process from the viewpoint of Mathematical Analysis, the Unit Step Function and the Ramp Function are equivalently performed as bi-parametric single-valued functions with only one constraint imposed on each parameter. The novelty of this work, when compared with other investigations concerning accurate and/or approximate forms of Unit Step Function and/or Ramp Function, is that the proposed exact formulae are not exhibited in terms of miscellaneous special functions, e.g., Gamma Function, Biexponential Function, or any other special functions, such as Error Function, Complementary Error Function, Hyperbolic Function, or Orthogonal Polynomials. In this framework, one may deduce that these formulae may be much more practical, flexible, and useful in the computational procedures that are inserted into operational calculus and digital signal processing techniques as well as other engineering practices.
            Full article
        
    Open AccessArticle
    
    LINEX Loss-Based Estimation of Expected Arrival Time of Next Event from HPP and NHPP Processes Past Truncated Time
                        
            by
                    M. S. Aminzadeh        
    
                
        
        Analytics 2025, 4(3), 20; https://doi.org/10.3390/analytics4030020 - 26 Aug 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            This article introduces a computational tool for Bayesian estimation of the expected time until the next event occurs in both homogeneous Poisson processes (HPPs) and non-homogeneous Poisson processes (NHPPs), following a truncated time. The estimation utilizes the linear exponential (LINEX) asymmetric loss function
             [...] Read more.
        
        
            This article introduces a computational tool for Bayesian estimation of the expected time until the next event occurs in both homogeneous Poisson processes (HPPs) and non-homogeneous Poisson processes (NHPPs), following a truncated time. The estimation utilizes the linear exponential (LINEX) asymmetric loss function and incorporates both gamma and non-informative priors. Furthermore, it presents a minimax-type criterion to ascertain the optimal sample size required to achieve a specified percentage reduction in posterior risk. Simulation studies indicate that estimators employing gamma priors for both HPP and NHPP demonstrate greater accuracy compared to those based on non-informative priors and maximum likelihood estimates (MLE), provided that the proposed data-driven method for selecting hyperparameters is applied.
            Full article
        
    
Figure 1
Open AccessArticle
    
    A Bounded Sine Skewed Model for Hydrological Data Analysis
                        
            by
                    Tassaddaq Hussain, Mohammad Shakil, Mohammad Ahsanullah and Bhuiyan Mohammad Golam Kibria        
    
                
        
        Analytics 2025, 4(3), 19; https://doi.org/10.3390/analytics4030019 - 13 Aug 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            Hydrological time series frequently exhibit periodic trends with variables such as rainfall, runoff, and evaporation rates often following annual cycles. Seasonal variations further contribute to the complexity of these data sets. A critical aspect of analyzing such phenomena is estimating realistic return intervals,
             [...] Read more.
        
        
            Hydrological time series frequently exhibit periodic trends with variables such as rainfall, runoff, and evaporation rates often following annual cycles. Seasonal variations further contribute to the complexity of these data sets. A critical aspect of analyzing such phenomena is estimating realistic return intervals, making the precise determination of these values essential. Given this importance, selecting an appropriate probability distribution is paramount. To address this need, we introduce a flexible probability model specifically designed to capture periodicity in hydrological data. We thoroughly examine its fundamental mathematical and statistical properties, including the asymptotic behavior of the probability density function (PDF) and hazard rate function (HRF), to enhance predictive accuracy. Our analysis reveals that the PDF exhibits polynomial decay as 
    
Figure 1
Open AccessArticle
    
    Predictive Framework for Regional Patent Output Using Digital Economic Indicators: A Stacked Machine Learning and Geospatial Ensemble to Address R&D Disparities
                        
            by
                    Amelia Zhao and Peng Wang        
    
                
        
        Analytics 2025, 4(3), 18; https://doi.org/10.3390/analytics4030018 - 8 Jul 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            As digital transformation becomes an increasingly central focus of national and regional policy agendas, parallel efforts are intensifying to stimulate innovation as a critical driver of firm competitiveness and high-quality economic growth. However, regional disparities in innovation capacity persist. This study proposes an
             [...] Read more.
        
        
            As digital transformation becomes an increasingly central focus of national and regional policy agendas, parallel efforts are intensifying to stimulate innovation as a critical driver of firm competitiveness and high-quality economic growth. However, regional disparities in innovation capacity persist. This study proposes an integrated framework in which regionally tracked digital economy indicators are leveraged to predict firm-level innovation performance, measured through patent activity, across China. Drawing on a comprehensive dataset covering 13 digital economic indicators from 2013 to 2022, this study spans core, broad, and narrow dimensions of digital development. Spatial dependencies among these indicators are assessed using global and local spatial autocorrelation measures, including Moran’s I and Geary’s C, to provide actionable insights for constructing innovation-conducive environments. To model the predictive relationship between digital metrics and innovation output, this study employs a suite of supervised machine learning techniques—Random Forest, Extreme Learning Machine (ELM), Support Vector Machine (SVM), XGBoost, and stacked ensemble approaches. Our findings demonstrate the potential of digital infrastructure metrics to serve as early indicators of regional innovation capacity, offering a data-driven foundation for targeted policymaking, strategic resource allocation, and the design of adaptive digital innovation ecosystems.
            Full article
        
    
Figure 1
Open AccessArticle
    
    Domestication of Source Text in Literary Translation Prevails over Foreignization
                        
            by
                    Emilio Matricciani        
    
                
        
        Analytics 2025, 4(3), 17; https://doi.org/10.3390/analytics4030017 - 20 Jun 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            Domestication is a translation theory in which the source text (to be translated) is matched to the foreign reader by erasing its original linguistic and cultural difference. This match aims at making the target text (translated text) more fluent. On the contrary, foreignization
             [...] Read more.
        
        
            Domestication is a translation theory in which the source text (to be translated) is matched to the foreign reader by erasing its original linguistic and cultural difference. This match aims at making the target text (translated text) more fluent. On the contrary, foreignization is a translation theory in which the foreign reader is matched to the source text. This paper mathematically explores the degree of domestication/foreignization in current translation practice of texts written in alphabetical languages. A geometrical representation of texts, based on linear combinations of deep–language parameters, allows us (a) to calculate a domestication index which measures how much domestication is applied to the source text and (b) to distinguish language families. An expansion index measures the relative spread around mean values. This paper reports statistics and results on translations of (a) Greek New Testament books in Latin and in 35 modern languages, belonging to diverse language families; and (b) English novels in Western languages. English and French, although attributed to different language families, mathematically almost coincide. The requirement of making the target text more fluent makes domestication, with varying degrees, universally adopted, so that a blind comparison of the same linguistic parameters of a text and its translation hardly indicates that they refer to each other.
            Full article
        
    
Figure 1
Open AccessArticle
    
    The Classical Model of Type-Token Systems Compared with Items from the Standardized Project Gutenberg Corpus
                        
            by
                    Martin Tunnicliffe and Gordon Hunter        
    
                
        
        Analytics 2025, 4(2), 16; https://doi.org/10.3390/analytics4020016 - 5 Jun 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            We compare the “classical” equations of type-token systems, namely Zipf’s laws, Heaps’ law and the relationships between their indices, with data selected from the Standardized Project Gutenberg Corpus (SPGC). Selected items all exceed 100,000 word-tokens and are trimmed to 100,000 word-tokens each. With
             [...] Read more.
        
        
            We compare the “classical” equations of type-token systems, namely Zipf’s laws, Heaps’ law and the relationships between their indices, with data selected from the Standardized Project Gutenberg Corpus (SPGC). Selected items all exceed 100,000 word-tokens and are trimmed to 100,000 word-tokens each. With the most egregious anomalies removed, a dataset of 8432 items is examined in terms of the relationships between the Zipf and Heaps’ indices computed using the Maximum Likelihood algorithm. Zipf’s second (size) law indices suggest that the types vs. frequency distribution is log–log convex, with the high and low frequency indices showing weak but significant negative correlation. Under certain circumstances, the classical equations work tolerably well, though the level of agreement depends heavily on the type of literature and the language (Finnish being notably anomalous). The frequency vs. rank characteristics exhibit log–log linearity in the “middle range” (ranks 100–1000), as characterised by the Kolmogorov–Smirnov significance. For most items, the Heaps’ index correlates strongly with the low frequency Zipf index in a manner consistent with classical theory, while the high frequency indices are largely uncorrelated. This is consistent with a simple simulation.
            Full article
        
    
Figure 1
Open AccessArticle
    
    Multiplicity Adjustments for Differences in Proportion Parameters in Multiple-Sample Misclassified Binary Data
                        
            by
                    Dewi Rahardja        
    
                
        
        Analytics 2025, 4(2), 15; https://doi.org/10.3390/analytics4020015 - 28 May 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            Generally, following an omnibus (overall equality) test, multiple pairwise comparison (MPC) tests are typically conducted as the second step in a sequential testing procedure to identify which specific pairs (e.g., proportions) exhibit significant differences. In this manuscript, we develop maximum likelihood estimation (MLE)
             [...] Read more.
        
        
            Generally, following an omnibus (overall equality) test, multiple pairwise comparison (MPC) tests are typically conducted as the second step in a sequential testing procedure to identify which specific pairs (e.g., proportions) exhibit significant differences. In this manuscript, we develop maximum likelihood estimation (MLE) methods to construct three different types of confidence intervals (CIs) for multiple pairwise differences in proportions, specifically in contexts where both types of misclassifications (i.e., over-reporting and under-reporting) exist in multiple-sample binomial data. Our closed-form algorithm is straightforward to implement. Consequently, when dealing with multiple sample proportions, we can readily apply MPC adjustment procedures—such as Bonferroni, Šidák, and Dunn—to address the issue of multiplicity. This manuscript advances the existing literature by extending from scenarios with only one type of misclassification to those involving both. Furthermore, we demonstrate our methods using a real-world data example.
            Full article
        
    
Figure 1
Open AccessReview
    
    Analytical Modeling of Ancillary Items
                        
            by
                    John Wilson        
    
                
        
        Analytics 2025, 4(2), 14; https://doi.org/10.3390/analytics4020014 - 19 May 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            Airlines profitability increasingly depends on the sale of ancillary items such as seat selection, baggage fees, etc. The modeling of ancillary items is becoming more important in the analytics literature. Much of the modeling is stylized and not immediately applicable. This paper contains
             [...] Read more.
        
        
            Airlines profitability increasingly depends on the sale of ancillary items such as seat selection, baggage fees, etc. The modeling of ancillary items is becoming more important in the analytics literature. Much of the modeling is stylized and not immediately applicable. This paper contains a review of the approaches and modeling assumptions made in the literature. The focus is on the assumptions made so that models may be evaluated for how effective they are for applications and to highlight gaps in the literature.
            Full article
        
    
Figure 1
Open AccessSystematic Review
    
    Artificial Intelligence Applied to the Analysis of Biblical Scriptures: A Systematic Review
                        
            by
                    Bruno Cesar Lima, Nizam Omar, Israel Avansi and Leandro Nunes de Castro        
    
                
        
        Analytics 2025, 4(2), 13; https://doi.org/10.3390/analytics4020013 - 11 Apr 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            The Holy Bible is the most read book in the world, originally written in Aramaic, Hebrew, and Greek over a time span in the order of centuries by many people, and formed by a combination of various literary styles, such as stories, prophecies,
             [...] Read more.
        
        
            The Holy Bible is the most read book in the world, originally written in Aramaic, Hebrew, and Greek over a time span in the order of centuries by many people, and formed by a combination of various literary styles, such as stories, prophecies, poetry, instructions, and others. As such, the Bible is a complex text to be analyzed by humans and machines. This paper provides a systematic survey of the application of Artificial Intelligence (AI) and some of its subareas to the analysis of the Biblical scriptures. Emphasis is given to what types of tasks are being solved, what are the main AI algorithms used, and their limitations. The findings deliver a general perspective on how this field is being developed, along with its limitations and gaps. This research follows a procedure based on three steps: planning (defining the review protocol), conducting (performing the survey), and reporting (formatting the report). The results obtained show there are seven main tasks solved by AI in the Bible analysis: machine translation, authorship identification, part of speech tagging (PoS tagging), semantic annotation, clustering, categorization, and Biblical interpretation. Also, the classes of AI techniques with better performance when applied to Biblical text research are machine learning, neural networks, and deep learning. The main challenges in the field involve the nature and style of the language used in the Bible, among others.
            Full article
        
    
Figure 1
Open AccessFeature PaperArticle
    
    Traffic Prediction with Data Fusion and Machine Learning
                        
            by
                    Juntao Qiu and Yaping Zhao        
    
                
        
        Analytics 2025, 4(2), 12; https://doi.org/10.3390/analytics4020012 - 9 Apr 2025
    
                Cited by 3            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            Traffic prediction, as a core task to alleviate urban congestion and optimize the transport system, has limitations in the integration of multimodal data, making it difficult to comprehensively capture the complex spatio-temporal characteristics of the transport system. Although some studies have attempted to
             [...] Read more.
        
        
            Traffic prediction, as a core task to alleviate urban congestion and optimize the transport system, has limitations in the integration of multimodal data, making it difficult to comprehensively capture the complex spatio-temporal characteristics of the transport system. Although some studies have attempted to introduce multimodal data, they mostly rely on resource-intensive deep neural network architectures, which have difficultly meeting the demands of practical applications. To this end, we propose a traffic prediction framework based on simple machine learning techniques that effectively integrates property features, amenity features, and emotion features (PAE features). Validated with large-scale real datasets, the method demonstrates excellent prediction performance while significantly reducing computational complexity and deployment costs. This study demonstrates the great potential of simple machine learning techniques in multimodal data fusion, provides an efficient and practical solution for traffic prediction, and offers an effective alternative to resource-intensive deep learning methods, opening up new paths for building scalable traffic prediction systems.
            Full article
        
    
Figure 1
Open AccessArticle
    
    Copula-Based Bayesian Model for Detecting Differential Gene Expression
                        
            by
                    Prasansha Liyanaarachchi and N. Rao Chaganty        
    
                
        
        Analytics 2025, 4(2), 11; https://doi.org/10.3390/analytics4020011 - 3 Apr 2025
    
                            
    
                    
        
                    Abstract 
            
            
                        
        
        
►▼
             Show Figures
         
            Deoxyribonucleic acid, more commonly known as DNA, is a fundamental genetic material in all living organisms, containing thousands of genes, but only a subset exhibit differential expression and play a crucial role in diseases. Microarray technology has revolutionized the study of gene expression,
             [...] Read more.
        
        
            Deoxyribonucleic acid, more commonly known as DNA, is a fundamental genetic material in all living organisms, containing thousands of genes, but only a subset exhibit differential expression and play a crucial role in diseases. Microarray technology has revolutionized the study of gene expression, with two primary types available for expression analysis: spotted cDNA arrays and oligonucleotide arrays. This research focuses on the statistical analysis of data from spotted cDNA microarrays. Numerous models have been developed to identify differentially expressed genes based on the red and green fluorescence intensities measured using these arrays. We propose a novel approach using a Gaussian copula model to characterize the joint distribution of red and green intensities, effectively capturing their dependence structure. Given the right-skewed nature of the intensity distributions, we model the marginal distributions using gamma distributions. Differentially expressed genes are identified using the Bayes estimate under our proposed copula framework. To evaluate the performance of our model, we conduct simulation studies to assess parameter estimation accuracy. Our results demonstrate that the proposed approach outperforms existing methods reported in the literature. Finally, we apply our model to Escherichia coli microarray data, illustrating its practical utility in gene expression analysis.
            Full article
        
    
Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
            Topic in 
            Applied Sciences, Future Internet, AI, Analytics, BDCC
        Data Intelligence and Computational Analytics
    Topic Editors: Carson K. Leung, Fei Hao, Xiaokang ZhouDeadline: 30 November 2026
 
                    Special Issues
                            Special Issue in 
                    Analytics
        Reviews on Data Analytics and Its Applications
    Guest Editor: Carson K. LeungDeadline: 31 March 2026
                            Special Issue in 
                    Analytics
        Critical Challenges in Large Language Models and Data Analytics: Trustworthiness, Scalability, and Societal Impact
    Guest Editors: Oluwaseun Ajao, Bayode Ogunleye, Hemlata SharmaDeadline: 31 July 2026
 
                        
 
            


