Data Modeling and Predictive Analytics

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Processes".

Deadline for manuscript submissions: closed (30 June 2021) | Viewed by 35548

Special Issue Editor


E-Mail Website
Guest Editor
Carleton University
Interests: big data analytics; data modeling; machine learning; web and social media analytics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data modeling is carried out to improve the quality of data, which eventually helps processes to identify interesting patterns and carry out predictions. Many of the latest research and development trends suggest and attempt to build data modeling, cleansing, and preparation techniques that are aligned towards improving and enhancing the performance and effectiveness of analysis and prediction mechanisms. This requires advanced methods, techniques, and tools that perform modeling, representation, and preparation of data in alignment with reasoning, analytics, and prediction.

The purpose of this Special Issue is to present the latest research in data modeling, predictive analytics, and related areas. Researchers and practitioners in these areas are invited to submit their original unpublished research works.

Topics of interest include but are not limited to:

  • Data modeling
  • Knowledge representation
  • Reasoning
  • Data preprocessing and transformation
  • Feature engineering
  • Predictive analytics
  • Ontologies and metadata
  • Knowledge graphs
  • Complex event modeling and processing
  • Cataloguing and organization
  • Semantic web and rules
  • Natural language modeling and processing
  • Rule-driven and data-driven reasoning
  • Prediction and forecasting
  • Machine learning
  • Deep learning

Dr. M. Omair Shafiq
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 1028 KiB  
Article
Information Content Measurement of ESG Factors via Entropy and Its Impact on Society and Security
by Hossein Hassani, Stephan Unger and Mohammad Reza Entezarian
Information 2021, 12(10), 391; https://doi.org/10.3390/info12100391 - 23 Sep 2021
Cited by 5 | Viewed by 3420
Abstract
We conducted a singular and sectoral vulnerability assessment of ESG factors of Dow-30-listed companies by applying the entropy weight method and analyzing each ESG factor’s information contribution to the overall ESG disclosure score. By reducing information entropy information, weaknesses in the structure of [...] Read more.
We conducted a singular and sectoral vulnerability assessment of ESG factors of Dow-30-listed companies by applying the entropy weight method and analyzing each ESG factor’s information contribution to the overall ESG disclosure score. By reducing information entropy information, weaknesses in the structure of a socio-technological system can be identified and improved. The relative information gain of each indicator improves proportionally to the reduction in entropy. The social pillar contains the most crucial information, followed by the environmental and governance pillars, relative to each other. The difference between the social and economic pillars was found to be statistically not significant, while the differences between the social pillar, respective to the economic and governance pillars were statistically significant. This suggests noisy information content of the governance pillar, indicating improvement potential in governance messaging. Moreover, we found that companies with lean and flexible governance structures are more likely to convey information content better. We also discuss the impact of ESG measures on society and security. Full article
(This article belongs to the Special Issue Data Modeling and Predictive Analytics)
Show Figures

Figure 1

12 pages, 2821 KiB  
Article
DebtG: A Graph Model for Debt Relationship
by Huanqing Cui
Information 2021, 12(9), 347; https://doi.org/10.3390/info12090347 - 26 Aug 2021
Cited by 2 | Viewed by 1894
Abstract
Debt is common in daily transactions, but it may bring great harm to individuals, enterprises, and society and even lead to a debt crisis. This paper proposes a weighted directed multi-arc graph model DebtG of debts among a large number of entities, including [...] Read more.
Debt is common in daily transactions, but it may bring great harm to individuals, enterprises, and society and even lead to a debt crisis. This paper proposes a weighted directed multi-arc graph model DebtG of debts among a large number of entities, including individuals, enterprises, banks, and governments, etc. Both vertices and arcs of DebtG have attributes. In further, it defines three basic debt structures: debt path, debt tree, and debt circuit, and it presents algorithms to detect them and basic methods to solve debt clearing problems using these structures. Because the data collection and computation need a third-party platform, this paper also presents the profit analysis of the platform. It carries out a case analysis using the real-life data of enterprises in Huangdao Zone. Finally, it points out four key problems that should be addressed in the future. Full article
(This article belongs to the Special Issue Data Modeling and Predictive Analytics)
Show Figures

Figure 1

21 pages, 992 KiB  
Article
A Hybrid MultiLayer Perceptron Under-Sampling with Bagging Dealing with a Real-Life Imbalanced Rice Dataset
by Moussa Diallo, Shengwu Xiong, Eshete Derb Emiru, Awet Fesseha, Aminu Onimisi Abdulsalami and Mohamed Abd Elaziz
Information 2021, 12(8), 291; https://doi.org/10.3390/info12080291 - 22 Jul 2021
Cited by 1 | Viewed by 2197
Abstract
Classification algorithms have shown exceptional prediction results in the supervised learning area. These classification algorithms are not always efficient when it comes to real-life datasets due to class distributions. As a result, datasets for real-life applications are generally imbalanced. Several methods have been [...] Read more.
Classification algorithms have shown exceptional prediction results in the supervised learning area. These classification algorithms are not always efficient when it comes to real-life datasets due to class distributions. As a result, datasets for real-life applications are generally imbalanced. Several methods have been proposed to solve the problem of class imbalance. In this paper, we propose a hybrid method combining the preprocessing techniques and those of ensemble learning. The original training set is undersampled by evaluating the samples by stochastic measurement (SM) and then training these samples selected by Multilayer Perceptron to return a balanced training set. The MLPUS (Multilayer perceptron undersampling) balanced training set is aggregated using the bagging ensemble method. We applied our method to the real-life Niger_Rice dataset and forty-four other imbalanced datasets from the KEEL repository in this study. We also compared our method with six other existing methods in the literature, such as the MLP classifier on the original imbalance dataset, MLPUS, UnderBagging (combining random under-sampling and bagging), RUSBoost, SMOTEBagging (Synthetic Minority Oversampling Technique and bagging), SMOTEBoost. The results show that our method is competitive compared to other methods. The Niger_Rice real-life dataset results are 75.6, 0.73, 0.76, and 0.86, respectively, for accuracy, F-measure, G-mean, and ROC with our proposed method. In contrast, the MLP classifier on the original imbalance Niger_Rice dataset gives results 72.44, 0.82, 0.59, and 0.76 respectively for accuracy, F-measure, G-mean, and ROC. Full article
(This article belongs to the Special Issue Data Modeling and Predictive Analytics)
Show Figures

Figure 1

14 pages, 1213 KiB  
Article
Research on Behavior Incentives of Prefabricated Building Component Manufacturers
by Pinbo Yao and Hongda Liu
Information 2021, 12(7), 284; https://doi.org/10.3390/info12070284 - 17 Jul 2021
Cited by 16 | Viewed by 2203
Abstract
Based on the positive externalities of prefabricated buildings, this paper constructs an evolutionary game model between the government and material component vendors and analyzes the changes in the behavior of the government and component vendors in different stages of the advancement of prefabricated [...] Read more.
Based on the positive externalities of prefabricated buildings, this paper constructs an evolutionary game model between the government and material component vendors and analyzes the changes in the behavior of the government and component vendors in different stages of the advancement of prefabricated buildings. Based on data modeling and equation prediction analysis, it can be found that the expansion of the incremental cost of construction at the initial stage inhibits the enthusiasm of the government. Thus, the government’s incentive behavior effectively affects the behavior of component vendors, and fiscal taxation and punishment policies will promote component vendors to provide prefabricated components. In the development stage, the government’s fiscal policy influence that weakens and affects component vendors’ behavior mainly comes from the incremental costs and benefits of components. Additionally, the difference between the builder’s incremental cost and the sales revenue narrowed. At this time, the behavior prediction of both parties tends to be steady. In the mature stage, prefabricated buildings will mainly rely on market forces, and the government can gradually withdraw from the market. The cost variable tends to be lower, and it can be predicted that component vendors tend to supply components, while the government tends to restrict policies. Full article
(This article belongs to the Special Issue Data Modeling and Predictive Analytics)
Show Figures

Figure 1

22 pages, 425 KiB  
Article
The Image Classification Method with CNN-XGBoost Model Based on Adaptive Particle Swarm Optimization
by Wenjiang Jiao, Xingwei Hao and Chao Qin
Information 2021, 12(4), 156; https://doi.org/10.3390/info12040156 - 09 Apr 2021
Cited by 15 | Viewed by 6351
Abstract
CNN is particularly effective in extracting spatial features. However, the single-layer classifier constructed by activation function in CNN is easily interfered by image noise, resulting in reduced classification accuracy. To solve the problem, the advanced ensemble model XGBoost is used to overcome the [...] Read more.
CNN is particularly effective in extracting spatial features. However, the single-layer classifier constructed by activation function in CNN is easily interfered by image noise, resulting in reduced classification accuracy. To solve the problem, the advanced ensemble model XGBoost is used to overcome the deficiency of a single classifier to classify image features. To further distinguish the extracted image features, a CNN-XGBoost image classification model optimized by APSO is proposed, where APSO optimizes the hyper-parameters on the overall architecture to promote the fusion of the two-stage model. The model is mainly composed of two parts: feature extractor CNN, which is used to automatically extract spatial features from images; feature classifier XGBoost is applied to classify features extracted after convolution. In the process of parameter optimization, to overcome the shortcoming that traditional PSO algorithm easily falls into a local optimal, the improved APSO guide the particles to search for optimization in space by two different strategies, which improves the diversity of particle population and prevents the algorithm from becoming trapped in local optima. The results on the image set show that the proposed model gets better results in image classification. Moreover, the APSO-XGBoost model performs well on the credit data, which indicates that the model has a good ability of credit scoring. Full article
(This article belongs to the Special Issue Data Modeling and Predictive Analytics)
Show Figures

Figure 1

14 pages, 735 KiB  
Article
A Markov Chain Monte Carlo Algorithm for Spatial Segmentation
by Nishanthi Raveendran and Georgy Sofronov
Information 2021, 12(2), 58; https://doi.org/10.3390/info12020058 - 30 Jan 2021
Cited by 4 | Viewed by 2024
Abstract
Spatial data are very often heterogeneous, which indicates that there may not be a unique simple statistical model describing the data. To overcome this issue, the data can be segmented into a number of homogeneous regions (or domains). Identifying these domains is one [...] Read more.
Spatial data are very often heterogeneous, which indicates that there may not be a unique simple statistical model describing the data. To overcome this issue, the data can be segmented into a number of homogeneous regions (or domains). Identifying these domains is one of the important problems in spatial data analysis. Spatial segmentation is used in many different fields including epidemiology, criminology, ecology, and economics. To solve this clustering problem, we propose to use the change-point methodology. In this paper, we develop a new spatial segmentation algorithm within the framework of the generalized Gibbs sampler. We estimate the average surface profile of binary spatial data observed over a two-dimensional regular lattice. We illustrate the performance of the proposed algorithm with examples using artificially generated and real data sets. Full article
(This article belongs to the Special Issue Data Modeling and Predictive Analytics)
Show Figures

Figure 1

13 pages, 460 KiB  
Article
Random Forest with Sampling Techniques for Handling Imbalanced Prediction of University Student Depression
by Siriporn Sawangarreerak and Putthiporn Thanathamathee
Information 2020, 11(11), 519; https://doi.org/10.3390/info11110519 - 05 Nov 2020
Cited by 24 | Viewed by 4071
Abstract
In this work, we propose a combined sampling technique to improve the performance of imbalanced classification of university student depression data. In experimental results, we found that combined random oversampling with the Tomek links under sampling methods allowed generating a relatively balanced depression [...] Read more.
In this work, we propose a combined sampling technique to improve the performance of imbalanced classification of university student depression data. In experimental results, we found that combined random oversampling with the Tomek links under sampling methods allowed generating a relatively balanced depression dataset without losing significant information. In this case, the random oversampling technique was used for sampling the minority class to balance the number of samples between the datasets. Then, the Tomek links technique was used for undersampling the samples by removing the depression data considered less relevant and noisy. The relatively balanced dataset was classified by random forest. The results show that the overall accuracy in the prediction of adolescent depression data was 94.17%, outperforming the individual sampling technique. Moreover, our proposed method was tested with another dataset for its external validity. This dataset’s predictive accuracy was found to be 93.33%. Full article
(This article belongs to the Special Issue Data Modeling and Predictive Analytics)
Show Figures

Figure 1

11 pages, 3156 KiB  
Article
Early Diagnosis of Carotid Stenosis by Ultrasound Doppler Investigations: A Classification Method for the Hemodynamic Parameter
by Huiyue Xiao, Yi Zhang, Hao Yin, Paul Liu and Dong Chyuan Liu
Information 2020, 11(11), 493; https://doi.org/10.3390/info11110493 - 22 Oct 2020
Cited by 2 | Viewed by 1984
Abstract
Pulsed Wave Doppler (PWD) is a traditional ultrasound technique used for the diagnosis of cardiovascular disease. The conventional diagnostic method is based on hemodynamic parameters obtained from the PW spectrum. However, it relies on the clinical experience of sonographers, and especially focusing on [...] Read more.
Pulsed Wave Doppler (PWD) is a traditional ultrasound technique used for the diagnosis of cardiovascular disease. The conventional diagnostic method is based on hemodynamic parameters obtained from the PW spectrum. However, it relies on the clinical experience of sonographers, and especially focusing on severe carotid stenosis. This paper proposes a classification method for the hemodynamic parameter using the RUSBoost algorithm. The proposed method improves the performance of RUSBoost by setting the empirical weight of each sample. The experimental results show that the proposed method reaches the accuracy of 90.1%, the sensitivity of 70%, and the specificity of 94%, which are 4%, 6%, and 2% higher than the original RUSBoost respectively. In addition, the proposed method is objective, since the empirical weights are computed based on Mahalanobis distance without any expert input. It can be used for the early detection of cardiovascular disease. Full article
(This article belongs to the Special Issue Data Modeling and Predictive Analytics)
Show Figures

Figure 1

18 pages, 4574 KiB  
Article
Fusion of Angle Measurements from Hull Mounted and Towed Array Sensors
by Kausar Jahan and Koteswara Rao Sanagapallea
Information 2020, 11(9), 432; https://doi.org/10.3390/info11090432 - 09 Sep 2020
Cited by 10 | Viewed by 2284
Abstract
Two sensor arrays, hull-mounted array, and towed array sensors are considered for bearings-only tracking. An algorithm is designed to combine the information obtained as bearing (angle) measurements from both sensor arrays to give a better solution. Using data from two different sensor arrays [...] Read more.
Two sensor arrays, hull-mounted array, and towed array sensors are considered for bearings-only tracking. An algorithm is designed to combine the information obtained as bearing (angle) measurements from both sensor arrays to give a better solution. Using data from two different sensor arrays reduces the problem of observability and the observer need not follow the S-maneuver to attain observability of the process. The performance of the fusion algorithm is comparable to that of theoretical Cramer–Rao lower bound and with that of the algorithm when bearing measurements from a single sensor array are considered. Different filters are used for analyzing both algorithms. Monte Carlo runs need to be done to evaluate the performance of algorithms more accurately. Also, the performance of the fusion algorithm is evaluated in terms of solution convergence time. Full article
(This article belongs to the Special Issue Data Modeling and Predictive Analytics)
Show Figures

Figure 1

12 pages, 1758 KiB  
Article
SVD++ Recommendation Algorithm Based on Backtracking
by Shijie Wang, Guiling Sun and Yangyang Li
Information 2020, 11(7), 369; https://doi.org/10.3390/info11070369 - 21 Jul 2020
Cited by 15 | Viewed by 7800
Abstract
Collaborative filtering (CF) has successfully achieved application in personalized recommendation systems. The singular value decomposition (SVD)++ algorithm is employed as an optimized SVD algorithm to enhance the accuracy of prediction by generating implicit feedback. However, the SVD++ algorithm is limited primarily by its [...] Read more.
Collaborative filtering (CF) has successfully achieved application in personalized recommendation systems. The singular value decomposition (SVD)++ algorithm is employed as an optimized SVD algorithm to enhance the accuracy of prediction by generating implicit feedback. However, the SVD++ algorithm is limited primarily by its low efficiency of calculation in the recommendation. To address this limitation of the algorithm, this study proposes a novel method to accelerate the computation of the SVD++ algorithm, which can help achieve more accurate recommendation results. The core of the proposed method is to conduct a backtracking line search in the SVD++ algorithm, optimize the recommendation algorithm, and find the optimal solution via the backtracking line search on the local gradient of the objective function. The algorithm is compared with the conventional CF algorithm in the FilmTrust, MovieLens 1 M and 10 M public datasets. The effectiveness of the proposed method is demonstrated by comparing the root mean square error, absolute mean error and recall rate simulation results. Full article
(This article belongs to the Special Issue Data Modeling and Predictive Analytics)
Show Figures

Figure 1

Back to TopTop