MDPI - Publisher of Open Access Journals

27 pages, 5252 KiB

Open AccessArticle

Mathematical Modeling and Clustering Framework for Cyber Threat Analysis Across Industries

by Fahim Sufi and Musleh Alsulami

Mathematics 2025, 13(4), 655; https://doi.org/10.3390/math13040655 - 17 Feb 2025

Cited by 2 | Viewed by 1131

The escalating prevalence of cyber threats across industries underscores the urgent need for robust analytical frameworks to understand their clustering, prevalence, and distribution. This study addresses the challenge of quantifying and analyzing relationships between 95 distinct cyberattack types and 29 industry sectors, leveraging [...] Read more.

The escalating prevalence of cyber threats across industries underscores the urgent need for robust analytical frameworks to understand their clustering, prevalence, and distribution. This study addresses the challenge of quantifying and analyzing relationships between 95 distinct cyberattack types and 29 industry sectors, leveraging a dataset of 9261 entries filtered from over 1 million news articles. Existing approaches often fail to capture nuanced patterns across such complex datasets, justifying the need for innovative methodologies. We present a rigorous mathematical framework integrating chi-square tests, Bayesian inference, Gaussian Mixture Models (GMMs), and Spectral Clustering. This framework identifies key patterns, such as 1150 Zero-Day Exploits clustered in the IT and Telecommunications sector, 732 Advanced Persistent Threats (APTs) in Government and Public Administration, and Malware with a posterior probability of 0.287 dominating the Healthcare sector. Temporal analyses reveal periodic spikes, such as in Zero-Day Exploits, and a persistent presence of Social Engineering Attacks, with 1397 occurrences across industries. These findings are quantified using significance scores (mean: 3.25 ± 0.7) and posterior probabilities, providing evidence for industry-specific vulnerabilities. This research offers actionable insights for policymakers, cybersecurity professionals, and organizational decision makers by equipping them with a data-driven understanding of sector-specific risks. The mathematical formulations are replicable and scalable, enabling organizations to allocate resources effectively and develop proactive defenses against emerging threats. By bridging mathematical theory to real-world cybersecurity challenges, this study delivers impactful contributions toward safeguarding critical infrastructure and digital assets. Full article

(This article belongs to the Special Issue Analytical Frameworks and Methods for Cybersecurity, 2nd Edition)

► Show Figures

Figure 1

22 pages, 460 KiB

Open AccessArticle

Test of the Equality of Several High-Dimensional Covariance Matrices: A Normal-Reference Approach

by Jingyi Wang, Tianming Zhu and Jin-Ting Zhang

Mathematics 2025, 13(2), 295; https://doi.org/10.3390/math13020295 - 17 Jan 2025

Viewed by 869

Abstract

As the field of big data continues to evolve, there is an increasing necessity to evaluate the equality of multiple high-dimensional covariance matrices. Many existing methods rely on approximations to the null distribution of the test statistic or its extreme-value distributions under stringent [...] Read more.

As the field of big data continues to evolve, there is an increasing necessity to evaluate the equality of multiple high-dimensional covariance matrices. Many existing methods rely on approximations to the null distribution of the test statistic or its extreme-value distributions under stringent conditions, leading to outcomes that are either overly permissive or excessively cautious. Consequently, these methods often lack robustness when applied to real-world data, as verifying the required assumptions can be arduous. In response to these challenges, we introduce a novel test statistic utilizing the normal-reference approach. We demonstrate that the null distribution of this test statistic shares the same limiting distribution as a chi-square-type mixture under certain regularity conditions, with the latter reliably estimable from data using the three-cumulant matched chi-square-approximation. Additionally, we establish the asymptotic power of our proposed test. Through comprehensive simulation studies and real data analysis, our proposed test demonstrates superior performance in terms of size control compared to several competing methods. Full article

(This article belongs to the Special Issue Computational Statistics and Data Analysis, 2nd Edition)

► Show Figures

Figure 1

20 pages, 5617 KiB

Open AccessArticle

From Bilingualism to Multilingualism: Mapping Language Dynamics in the Linguistic Landscape of Hispanic Philadelphia

by Daniel Guarín

Languages 2024, 9(4), 123; https://doi.org/10.3390/languages9040123 - 1 Apr 2024

Cited by 6 | Viewed by 3122

Abstract

This study explores the linguistic landscape (LL) of three Hispanic neighborhoods in Philadelphia, PA, aiming to document and measure the presence of the Spanish language in public spaces and understand the influence of time, location, and establishment type on language use. [...] Read more.

This study explores the linguistic landscape (LL) of three Hispanic neighborhoods in Philadelphia, PA, aiming to document and measure the presence of the Spanish language in public spaces and understand the influence of time, location, and establishment type on language use. Based on 3437 signs analyzed from 2021 to 2023, our findings reveal that English dominates the LL at 61.65%, while Spanish constitutes 24.16%. The study employs chi-squared tests to confirm the impact of time and location on language use, highlighting a rise in bilingual and monolingual Spanish signs over time. Additionally, variables were combined into clusters using a heatmap to explore language use across different establishments. Bilingualism emerges as a sustained trend, underscoring inclusivity and linguistic diversity within these contexts. Location analysis exposes unique linguistic characteristics in each neighborhood, reflecting the cultural and linguistic diversity of their communities. The Golden Block exhibits bilingual prevalence, indicative of evolving demographics. Olney showcases language mixtures due to diverse ethnic and sociolinguistic influences, while South Philadelphia’s Italian Market area features prevalent Spanish and multilingual signage. The study underscores the growing presence of Spanish and minority languages, emphasizing the need to recognize and revitalize linguistic diversity in urban spaces. As cities evolve, continued exploration of the LL is crucial to understanding language dynamics in relation to identity, culture, and power. Full article

(This article belongs to the Special Issue Spanish in the US: A Sociolinguistic Approach)

► Show Figures

Figure 1

21 pages, 1490 KiB

Open AccessFeature PaperArticle

Testing Equality of Several Distributions at High Dimensions: A Maximum-Mean-Discrepancy-Based Approach

by Zhi Peng Ong, Aixiang Andy Chen, Tianming Zhu and Jin-Ting Zhang

Mathematics 2023, 11(20), 4374; https://doi.org/10.3390/math11204374 - 21 Oct 2023

Viewed by 3008

Abstract

With the development of modern data collection techniques, researchers often encounter high-dimensional data across various research fields. An important problem is to determine whether several groups of these high-dimensional data originate from the same population. To address this, this paper presents a novel [...] Read more.

With the development of modern data collection techniques, researchers often encounter high-dimensional data across various research fields. An important problem is to determine whether several groups of these high-dimensional data originate from the same population. To address this, this paper presents a novel k-sample test for equal distributions for high-dimensional data, utilizing the Maximum Mean Discrepancy (MMD). The test statistic is constructed using a V-statistic-based estimator of the squared MMD derived for several samples. The asymptotic null and alternative distributions of the test statistic are derived. To approximate the null distribution accurately, three simple methods are described. To evaluate the performance of the proposed test, two simulation studies and a real data example are presented, demonstrating the effectiveness and reliability of the test in practical applications. Full article

(This article belongs to the Special Issue Advances of Functional and High-Dimensional Data Analysis)

► Show Figures

Figure 1

36 pages, 4029 KiB

Open AccessArticle

Detection of Diabetes through Microarray Genes with Enhancement of Classifiers Performance

by Dinesh Chellappan and Harikumar Rajaguru

Diagnostics 2023, 13(16), 2654; https://doi.org/10.3390/diagnostics13162654 - 11 Aug 2023

Cited by 7 | Viewed by 1868

Abstract

Diabetes is a life-threatening, non-communicable disease. Diabetes mellitus is a prevalent chronic disease with a significant global impact. The timely detection of diabetes in patients is necessary for an effective treatment. The primary objective of this study is to propose a novel approach [...] Read more.

Diabetes is a life-threatening, non-communicable disease. Diabetes mellitus is a prevalent chronic disease with a significant global impact. The timely detection of diabetes in patients is necessary for an effective treatment. The primary objective of this study is to propose a novel approach for identifying type II diabetes mellitus using microarray gene data. Specifically, our research focuses on the performance enhancement of methods for detecting diabetes. Four different Dimensionality Reduction techniques, Detrend Fluctuation Analysis (DFA), the Chi-square probability density function (Chi2pdf), the Firefly algorithm, and Cuckoo Search, are used to reduce high dimensional data. Metaheuristic algorithms like Particle Swarm Optimization (PSO) and Harmonic Search (HS) are used for feature selection. Seven classifiers, Non-Linear Regression (NLR), Linear Regression (LR), Logistics Regression (LoR), Gaussian Mixture Model (GMM), Bayesian Linear Discriminant Classifier (BLDC), Softmax Discriminant Classifier (SDC), and Support Vector Machine—Radial Basis Function (SVM-RBF), are utilized to classify the diabetic and non-diabetic classes. The classifiers’ performances are analyzed through parameters such as accuracy, recall, precision, F1 score, error rate, Matthews Correlation Coefficient (MCC), Jaccard metric, and kappa. The SVM (RBF) classifier with the Chi2pdf Dimensionality Reduction technique with a PSO feature selection method attained a high accuracy of 91% with a Kappa of 0.7961, outperforming all of the other classifiers. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

16 pages, 2355 KiB

Open AccessArticle

Joint Action Toxicity of Arsenic (As) and Lead (Pb) Mixtures in Developing Zebrafish

by Keturah Kiper and Jennifer L. Freeman

Biomolecules 2022, 12(12), 1833; https://doi.org/10.3390/biom12121833 - 8 Dec 2022

Cited by 9 | Viewed by 2801

Abstract

Arsenic (As) and lead (Pb) are environmental pollutants found in common sites and linked to similar adverse health effects. Multiple studies have investigated the toxicity of each metal individually or in complex mixtures. Studies defining the joint interaction of a binary exposure to [...] Read more.

Arsenic (As) and lead (Pb) are environmental pollutants found in common sites and linked to similar adverse health effects. Multiple studies have investigated the toxicity of each metal individually or in complex mixtures. Studies defining the joint interaction of a binary exposure to As and Pb, especially during the earliest stages of development, are limited and lack confirmation of the predicted mixture interaction. We hypothesized that a mixture of As (iAsIII) and Pb will have a concentration addition (CA) interaction informed by common pathways of toxicity of the two metals. To test this hypothesis, developing zebrafish (1–120 h post fertilization; hpf) were first exposed to a wide range of concentrations of As or Pb separately to determine 120 hpf lethal concentrations. These data were then used in the CA and independent action (IA) models to predict the type of mixture interaction from a co-exposure to As and Pb. Three titration mixture experiments were completed to test prediction of observed As and Pb mixture interaction by keeping the Pb concentration constant and varying As concentrations in each experiment. The prediction accuracy of the two models was then calculated using the prediction deviation ratio (PDR) and Chi-square test and regression modeling applied to determine type of interaction. Individual metal exposures determined As and Pb concentrations at which 25% (39.0 ppm Pb, 40.2 ppm As), 50% (73.8 ppm Pb, 55.4 ppm As), 75% (99.9 ppm Pb, 66.6 ppm As), and 100% (121.7 ppm Pb, 77.3 ppm As) lethality was observed at 120 hpf. These data were used to graph the predicted mixture interaction using the CA and IA models. The titration experiments provided experimental observational data to assess the prediction. PDR values showed the CA model approached 1, whereas all PDR values for the IA model had large deviations from predicted data. In addition, the Chi-square test showed most observed results were significantly different from the predictions, except in the first experiment (Pb LC₂₅ held constant) with the CA model. Regression modeling for the IA model showed primarily a synergistic response among all exposure scenarios, whereas the CA model indicated additive response at lower exposure concentrations and synergism at higher exposure concentrations. The CA model was a better predictor of the Pb and As binary mixture interaction compared to the IA model and was able to delineate types of mixture interactions among different binary exposure scenarios. Full article

(This article belongs to the Special Issue Toxic and Essential Metals in Human Health and Disease 2022-2023)

► Show Figures

Graphical abstract

10 pages, 493 KiB

Open AccessCommunication

Phytochemical Profile, Safety and Efficacy of a Herbal Mixture Used for Contraception by Traditional Health Practitioners in Ngaka Modiri Molema District Municipality, South Africa

by Molelekwa Arthur Moroole, Simeon Albert Materechera, Wilfred Otang-Mbeng, Rose Hayeshi, Cor Bester and Adeyemi Oladapo Aremu

Plants 2022, 11(2), 193; https://doi.org/10.3390/plants11020193 - 12 Jan 2022

Cited by 5 | Viewed by 3836

Abstract

The use of medicinal plants for contraception remains a common practice among South African ethnic groups. The present study assessed the phytochemical profile, cytotoxicity, acute oral toxicity and efficacy of a herbal mixture used for contraception by the Batswana of South Africa. An [...] Read more.

The use of medicinal plants for contraception remains a common practice among South African ethnic groups. The present study assessed the phytochemical profile, cytotoxicity, acute oral toxicity and efficacy of a herbal mixture used for contraception by the Batswana of South Africa. An aqueous extract was prepared from equal quantities (in terms of weight) of Bulbine frutescens (roots), Helichrysum caespititium (leaves) and Teucrium trifidum (leaves) based on a recipe used by traditional health practitioners. The phytochemical profiles of the freeze-dried herbal mixture were analyzed using gas chromatography–mass spectrometry (GC-MS). In addition, cytotoxicity was determined using an MTT assay on Vero cells and in vivo contraceptive efficacy was evaluated using seven Sprague Dawley rats per control and treatment groups. The control group received distilled water while test groups received 5, 50 and 300 mg/kg of the herbal mixture, which was administered orally once a day for three consecutive days. Subsequently, female rats were paired 1:1 with males for 3 days. Their weights were measured weekly and incidence of pregnancy was recorded. The GC-MS chromatogram revealed the presence of 12 identified and 9 unidentified compounds. In terms of safety, the herbal mixture had an IC₅₀ value of 755.2 μg/mL and 2000 mg/kg, which was the highest tested dose that caused no mortality or morbidity in the rats. A contraceptive efficacy of 14.5% was exerted with 50 mg/kg herbal mixture extract while other doses had no effects given that all the rats were pregnant. Based on a chi-square test (p < 0.05), there was no correlation between the tested herbal mixture doses and contraception, nor on the weight of the rats. Overall, the herbal mixture extract was found to be safe but had limited contraceptive efficacy at the tested doses. In future studies, exploring increased dose range, solvent extract types and hormonal analysis will be pertinent. Full article

(This article belongs to the Special Issue Medicinal Plant Extracts)

► Show Figures

Figure 1

28 pages, 6399 KiB

Open AccessArticle

A Constrained Generalized Functional Linear Model for Multi-Loci Genetic Mapping

by Jiayu Huang, Jie Yang, Zhangrong Gu, Wei Zhu and Song Wu

Stats 2021, 4(3), 550-577; https://doi.org/10.3390/stats4030033 - 25 Jun 2021

Cited by 1 | Viewed by 2587

Abstract

In genome-wide association studies (GWAS), efficient incorporation of linkage disequilibria (LD) among densely typed genetic variants into association analysis is a critical yet challenging problem. Functional linear models (FLM), which impose a smoothing structure on the coefficients of correlated covariates, are advantageous in [...] Read more.

In genome-wide association studies (GWAS), efficient incorporation of linkage disequilibria (LD) among densely typed genetic variants into association analysis is a critical yet challenging problem. Functional linear models (FLM), which impose a smoothing structure on the coefficients of correlated covariates, are advantageous in genetic mapping of multiple variants with high LD. Here we propose a novel constrained generalized FLM (cGFLM) framework to perform simultaneous association tests on a block of linked SNPs with various trait types, including continuous, binary and zero-inflated count phenotypes. The new cGFLM applies a set of inequality constraints on the FLM to ensure model identifiability under different genetic codings. The method is implemented via B-splines, and an augmented Lagrangian algorithm is employed for parameter estimation. For hypotheses testing, a test statistic that accounts for the model constraints was derived, following a mixture of chi-square distributions. Simulation results show that cGFLM is effective in identifying causal loci and gene clusters compared to several competing methods based on single markers and SKAT-C. We applied the proposed method to analyze a candidate gene-based COGEND study and a large-scale GWAS data on dental caries risk. Full article

► Show Figures

Figure 1

28 pages, 476 KiB

Open AccessArticle

Chebyshev–Edgeworth-Type Approximations for Statistics Based on Samples with Random Sizes

by Gerd Christoph and Vladimir V. Ulyanov

Mathematics 2021, 9(7), 775; https://doi.org/10.3390/math9070775 - 2 Apr 2021

Cited by 2 | Viewed by 3142

Abstract

Second-order Chebyshev–Edgeworth expansions are derived for various statistics from samples with random sample sizes, where the asymptotic laws are scale mixtures of the standard normal or chi-square distributions with scale mixing gamma or inverse exponential distributions. A formal construction of asymptotic expansions is [...] Read more.

Second-order Chebyshev–Edgeworth expansions are derived for various statistics from samples with random sample sizes, where the asymptotic laws are scale mixtures of the standard normal or chi-square distributions with scale mixing gamma or inverse exponential distributions. A formal construction of asymptotic expansions is developed. Therefore, the results can be applied to a whole family of asymptotically normal or chi-square statistics. The random mean, the normalized Student t-distribution and the Student t-statistic under non-normality with the normal limit law are considered. With the chi-square limit distribution, Hotelling’s generalized

T_{0}^{2}

statistics and scale mixture of chi-square distributions are used. We present the first Chebyshev–Edgeworth expansions for asymptotically chi-square statistics based on samples with random sample sizes. The statistics allow non-random, random, and mixed normalization factors. Depending on the type of normalization, we can find three different limit distributions for each of the statistics considered. Limit laws are Student t-, standard normal, inverse Pareto, generalized gamma, Laplace and generalized Laplace as well as weighted sums of generalized gamma distributions. The paper continues the authors’ studies on the approximation of statistics for randomly sized samples. Full article

(This article belongs to the Special Issue Analytical Methods and Convergence in Probability with Applications)

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI