Next Issue
Volume 3, December
Previous Issue
Volume 3, June
 
 

Analytics, Volume 3, Issue 3 (September 2024) – 7 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Select all
Export citation of selected articles as:
17 pages, 327 KB  
Article
Directed Topic Extraction with Side Information for Sustainability Analysis
by Maria Osipenko
Analytics 2024, 3(3), 389-405; https://doi.org/10.3390/analytics3030021 - 11 Sep 2024
Viewed by 1876
Abstract
Topic analysis represents each document in a text corpus in a low-dimensional latent topic space. In some cases, the desired topic representation is subject to specific requirements or guidelines constituting side information. For instance, sustainability-aware investors might be interested in automatically assessing aspects [...] Read more.
Topic analysis represents each document in a text corpus in a low-dimensional latent topic space. In some cases, the desired topic representation is subject to specific requirements or guidelines constituting side information. For instance, sustainability-aware investors might be interested in automatically assessing aspects of firm sustainability based on the textual content of its corporate reports, focusing on the established 17 UN sustainability goals. The main corpus consists of the corporate report texts, while the texts containing the definitions of the 17 UN sustainability goals represent the side information. Under the assumption that both text corpora share a common low-dimensional subspace, we propose representing them in such a space via directed topic extraction using matrix co-factorization. Both the main and the side text corpora are first represented as term–context matrices, which are then jointly decomposed into word–topic and topic–context matrices. The word–topic matrix is common to both text corpora, whereas the topic–context matrices contain specific representations in the shared topic space. A nuisance parameter, which allows us to shift the focus between the error minimization of individual factorization terms, controls the extent to which the side information is taken into account. With our approach, documents from the main and the side corpora can be related to each other in the resulting latent topic space. That is, the corporate reports are represented in the same latent topic space as the descriptions of the 17 UN sustainability goals, enabling a structured automatic sustainability assessment of the textual report’s content. We provide an algorithm for such directed topic extraction and propose techniques for visualizing and interpreting the results. Full article
(This article belongs to the Special Issue Business Analytics and Applications)
Show Figures

Figure 1

21 pages, 389 KB  
Article
SIMEX-Based and Analytical Bias Corrections in Stocking–Lord Linking
by Alexander Robitzsch
Analytics 2024, 3(3), 368-388; https://doi.org/10.3390/analytics3030020 - 6 Aug 2024
Cited by 3 | Viewed by 1475
Abstract
Stocking–Lord (SL) linking is a popular linking method for group comparisons based on dichotomous item responses. This article proposes a bias correction technique based on the simulation extrapolation (SIMEX) method for SL linking in the 2PL model in the presence of uniform differential [...] Read more.
Stocking–Lord (SL) linking is a popular linking method for group comparisons based on dichotomous item responses. This article proposes a bias correction technique based on the simulation extrapolation (SIMEX) method for SL linking in the 2PL model in the presence of uniform differential item functioning (DIF). The SIMEX-based method is compared to the analytical bias correction methods of SL linking. It turned out in a simulation study that SIMEX-based SL linking performed best, is easy to implement, and can be adapted to other linking methods straightforwardly. Full article
Show Figures

Figure 1

24 pages, 7013 KB  
Article
Comparative Analysis of Nature-Inspired Metaheuristic Techniques for Optimizing Phishing Website Detection
by Thomas Nagunwa
Analytics 2024, 3(3), 344-367; https://doi.org/10.3390/analytics3030019 - 6 Aug 2024
Cited by 3 | Viewed by 2295
Abstract
The increasing number, frequency, and sophistication of phishing website-based attacks necessitate the development of robust solutions for detecting phishing websites to enhance the overall security of cyberspace. Drawing inspiration from natural processes, nature-inspired metaheuristic techniques have been proven to be efficient in solving [...] Read more.
The increasing number, frequency, and sophistication of phishing website-based attacks necessitate the development of robust solutions for detecting phishing websites to enhance the overall security of cyberspace. Drawing inspiration from natural processes, nature-inspired metaheuristic techniques have been proven to be efficient in solving complex optimization problems in diverse domains. Following these successes, this research paper aims to investigate the effectiveness of metaheuristic techniques, particularly Genetic Algorithms (GAs), Differential Evolution (DE), and Particle Swarm Optimization (PSO), in optimizing the hyperparameters of machine learning (ML) algorithms for detecting phishing websites. Using multiple datasets, six ensemble classifiers were trained on each dataset and their hyperparameters were optimized using each metaheuristic technique. As a baseline for assessing performance improvement, the classifiers were also trained with the default hyperparameters. To validate the genuine impact of the techniques over the use of default hyperparameters, we conducted statistical tests on the accuracy scores of all the optimized classifiers. The results show that the GA is the most effective technique, by improving the accuracy scores of all the classifiers, followed by DE, which improved four of the six classifiers. PSO was the least effective, improving only one classifier. It was also found that GA-optimized Gradient Boosting, LGBM and XGBoost were the best classifiers across all the metrics in predicting phishing websites, achieving peak accuracy scores of 98.98%, 99.24%, and 99.47%, respectively. Full article
Show Figures

Figure 1

26 pages, 808 KB  
Article
A Longitudinal Tree-Based Framework for Lapse Management in Life Insurance
by Mathias Valla
Analytics 2024, 3(3), 318-343; https://doi.org/10.3390/analytics3030018 - 5 Aug 2024
Cited by 1 | Viewed by 1645
Abstract
Developing an informed lapse management strategy (LMS) is critical for life insurers to improve profitability and gain insight into the risk of their global portfolio. Prior research in actuarial science has shown that targeting policyholders by maximising their individual customer lifetime value is [...] Read more.
Developing an informed lapse management strategy (LMS) is critical for life insurers to improve profitability and gain insight into the risk of their global portfolio. Prior research in actuarial science has shown that targeting policyholders by maximising their individual customer lifetime value is more advantageous than targeting all those likely to lapse. However, most existing lapse analyses do not leverage the variability of features and targets over time. We propose a longitudinal LMS framework, utilising tree-based models for longitudinal data, such as left-truncated and right-censored (LTRC) trees and forests, as well as mixed-effect tree-based models. Our methodology provides time-informed insights, leading to increased precision in targeting. Our findings indicate that the use of longitudinally structured data significantly enhances the precision of models in predicting lapse behaviour, estimating customer lifetime value, and evaluating individual retention gains. The implementation of mixed-effect random forests enables the production of time-varying predictions that are highly relevant for decision-making. This paper contributes to the field of lapse analysis for life insurers by demonstrating the importance of exploiting the complete past trajectory of policyholders, which is often available in insurers’ information systems but has yet to be fully utilised. Full article
(This article belongs to the Special Issue Business Analytics and Applications)
Show Figures

Figure 1

21 pages, 762 KB  
Article
Enhancing Talent Recruitment in Business Intelligence Systems: A Comparative Analysis of Machine Learning Models
by Hikmat Al-Quhfa, Ali Mothana, Abdussalam Aljbri and Jie Song
Analytics 2024, 3(3), 297-317; https://doi.org/10.3390/analytics3030017 - 15 Jul 2024
Cited by 12 | Viewed by 3880
Abstract
In the competitive field of business intelligence, optimizing talent recruitment through data-driven methodologies is crucial for better decision-making. This study compares the effectiveness of various machine learning models to improve recruitment accuracy and efficiency. Using the recruitment data from a major Yemeni organization [...] Read more.
In the competitive field of business intelligence, optimizing talent recruitment through data-driven methodologies is crucial for better decision-making. This study compares the effectiveness of various machine learning models to improve recruitment accuracy and efficiency. Using the recruitment data from a major Yemeni organization (2019–2022), we evaluated models including K-Nearest Neighbors, Logistic Regression, Support Vector Machine, Naive Bayes, Decision Trees, Random Forest, Gradient Boosting Classifier, AdaBoost Classifier, and Neural Networks. Hyperparameter tuning and cross-validation were used for optimization. The Random Forest model achieved the highest accuracy (92.8%), followed by Neural Networks (92.6%) and Gradient Boosting Classifier (92.5%). These results suggest that advanced machine learning models, particularly Random Forest and Neural Networks, can significantly enhance the recruitment processes in business intelligence systems. This study provides valuable insights for recruiters, advocating for the integration of sophisticated machine learning techniques in talent acquisition strategies. Full article
Show Figures

Figure 1

21 pages, 15154 KB  
Communication
Modeling Sea Level Rise Using Ensemble Techniques: Impacts on Coastal Adaptation, Freshwater Ecosystems, Agriculture and Infrastructure
by Sambandh Bhusan Dhal, Rishabh Singh, Tushar Pandey, Sheelabhadra Dey, Stavros Kalafatis and Vivekvardhan Kesireddy
Analytics 2024, 3(3), 276-296; https://doi.org/10.3390/analytics3030016 - 5 Jul 2024
Viewed by 1680
Abstract
Sea level rise (SLR) is a crucial indicator of climate change, primarily driven by greenhouse gas emissions and the subsequent increase in global temperatures. The impact of SLR, however, varies regionally due to factors such as ocean bathymetry, resulting in distinct shifts across [...] Read more.
Sea level rise (SLR) is a crucial indicator of climate change, primarily driven by greenhouse gas emissions and the subsequent increase in global temperatures. The impact of SLR, however, varies regionally due to factors such as ocean bathymetry, resulting in distinct shifts across different areas compared to the global average. Understanding the complex factors influencing SLR across diverse spatial scales, along with the associated uncertainties, is essential. This study focuses on the East Coast of the United States and Gulf of Mexico, utilizing historical SLR data from 1993 to 2023. To forecast SLR trends from 2024 to 2103, a weighted ensemble model comprising SARIMAX, LSTM, and exponential smoothing models was employed. Additionally, using historical greenhouse gas data, an ensemble of LSTM models was used to predict real-time SLR values, achieving a testing loss of 0.005. Furthermore, conductance and dissolved oxygen (DO) values were assessed for the entire forecasting period, leveraging forecasted SLR trends to evaluate the impacts on marine life, agriculture, and infrastructure. Full article
Show Figures

Figure 1

21 pages, 737 KB  
Article
TaskFinder: A Semantics-Based Methodology for Visualization Task Recommendation
by Darius Coelho, Bhavya Ghai, Arjun Krishna, Maria Velez-Rojas, Steve Greenspan, Serge Mankovski and Klaus Mueller
Analytics 2024, 3(3), 255-275; https://doi.org/10.3390/analytics3030015 - 4 Jul 2024
Viewed by 1939
Abstract
Data visualization has entered the mainstream, and numerous visualization recommender systems have been proposed to assist visualization novices, as well as busy professionals, in selecting the most appropriate type of chart for their data. Given a dataset and a set of user-defined analytical [...] Read more.
Data visualization has entered the mainstream, and numerous visualization recommender systems have been proposed to assist visualization novices, as well as busy professionals, in selecting the most appropriate type of chart for their data. Given a dataset and a set of user-defined analytical tasks, these systems can make recommendations based on expert coded visualization design principles or empirical models. However, the need to identify the pertinent analytical tasks beforehand still exists and often requires domain expertise. In this work, we aim to automate this step with TaskFinder, a prototype system that leverages the information available in textual documents to understand domain-specific relations between attributes and tasks. TaskFinder employs word vectors as well as a custom dependency parser along with an expert-defined list of task keywords to extract and rank associations between tasks and attributes. It pairs these associations with a statistical analysis of the dataset to filter out tasks irrelevant given the data. TaskFinder ultimately produces a ranked list of attribute–task pairs. We show that the number of domain articles needed to converge to a recommendation consensus is bounded for our approach. We demonstrate our TaskFinder over multiple domains with varying article types and quantities. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop