Big Data Mining and Analytics with Applications

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (29 February 2024) | Viewed by 9481

Special Issue Editors

Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Interests: big data analytics; AI for data base; database management system
Special Issues, Collections and Topics in MDPI journals
Department of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
Interests: data quality management; crowdsourcing; truth discovery

Special Issue Information

Dear Colleagues,

In the Big Data era, the amount of data increases in exponential order, containing a range of cross-media contents, including text, images, videos, audio, and time series. Such data are derived from multiple sources, such as sensors, remote sensing, and social networks. This requires the use of Big Data systems to build complex and efficient in-depth learning models and methods.

Thanks to data-driven applications, such as voice recognition, object detection, image classification, and machine translation, deep Big Data analysis and mining is in great demand. This requires advanced techniques as well as novel applications. Thus, we are searching for novel and unpublished research focused on Big Data analytics and mining techniques and applications.

This Special Issue will focus on recent theoretical, technical, and application studies for Big Data analytics and mining. Topics include but are not limited to:

(1) Theory and novel application scenarios for big data analytics;

(2) Machine learning techniques for big data analytics and mining;

(3) Big data analysis powered by knowledge;

(4) Cloud computing platform based Big Data mining;

(5) Novel methods and applications for time series analytics;

(6) Social network analysis and web mining;

(7) Large-scale human activities data analysis;

(8) Social intelligence and personal data analytics.

Prof. Dr. Hongzhi Wang
Prof. Dr. Ye Chen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data
  • data analytics
  • statistical techniques
  • data fusion
  • data classification
  • data extraction
  • cluster analysis
  • data mining
  • machine learning
  • time series
  • social network analysis
  • big data applications

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

23 pages, 665 KiB  
Article
An Effective Partitional Crisp Clustering Method Using Gradient Descent Approach
by Soroosh Shalileh
Mathematics 2023, 11(12), 2617; https://doi.org/10.3390/math11122617 - 07 Jun 2023
Cited by 1 | Viewed by 1099
Abstract
Enhancing the effectiveness of clustering methods has always been of great interest. Therefore, inspired by the success story of the gradient descent approach in supervised learning in the current research, we proposed an effective clustering method using the gradient descent approach. As a [...] Read more.
Enhancing the effectiveness of clustering methods has always been of great interest. Therefore, inspired by the success story of the gradient descent approach in supervised learning in the current research, we proposed an effective clustering method using the gradient descent approach. As a supplementary device for further improvements, we implemented our proposed method using an automatic differentiation library to facilitate the users in applying any differentiable distance functions. We empirically validated and compared the performance of our proposed method with four popular and effective clustering methods from the literature on 11 real-world and 720 synthetic datasets. Our experiments proved that our proposed method is valid, and in the majority of the cases, it is more effective than the competitors. Full article
(This article belongs to the Special Issue Big Data Mining and Analytics with Applications)
Show Figures

Figure 1

22 pages, 410 KiB  
Article
High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method
by Dengluan Dai, Anmin Tang and Jinli Ye
Mathematics 2023, 11(10), 2232; https://doi.org/10.3390/math11102232 - 10 May 2023
Cited by 1 | Viewed by 1077
Abstract
The quantile regression model is widely used in variable relationship research of moderate sized data, due to its strong robustness and more comprehensive description of response variable characteristics. With the increase of data size and data dimensions, there have been some studies on [...] Read more.
The quantile regression model is widely used in variable relationship research of moderate sized data, due to its strong robustness and more comprehensive description of response variable characteristics. With the increase of data size and data dimensions, there have been some studies on high-dimensional quantile regression under the classical statistical framework, including a high-efficiency frequency perspective; however, this comes at the cost of randomness quantification, or the use of a lower efficiency Bayesian method based on MCMC sampling. To overcome these problems, we propose high-dimensional quantile regression with a spike-and-slab lasso penalty based on variational Bayesian (VBSSLQR), which can, not only improve the computational efficiency, but also measure the randomness via variational distributions. Simulation studies and real data analysis illustrated that the proposed VBSSLQR method was superior or equivalent to other quantile and nonquantile regression methods (including Bayesian and non-Bayesian methods), and its efficiency was higher than any other method. Full article
(This article belongs to the Special Issue Big Data Mining and Analytics with Applications)
Show Figures

Figure 1

16 pages, 2624 KiB  
Article
The Classification of Application Users Supporting and Facilitating Travel Mobility Using Two-Step Cluster Analysis
by Jaroslav Mašek, Vladimíra Štefancová, Jaroslav Mazanec and Petra Juránková
Mathematics 2023, 11(9), 2192; https://doi.org/10.3390/math11092192 - 06 May 2023
Viewed by 1007
Abstract
There is a significant and supported trend toward the achievement of ensuring continuous door-to-door travel in the pan-European transport network. Many innovative programs are dedicated to this topic through assigned projects. This paper is based on the concrete partial results of the H2020 [...] Read more.
There is a significant and supported trend toward the achievement of ensuring continuous door-to-door travel in the pan-European transport network. Many innovative programs are dedicated to this topic through assigned projects. This paper is based on the concrete partial results of the H2020 project Shift2Rail IP4 to support the deployment of mobility as a service (IP4MaaS). Attitudes towards travel for demonstration sites were assessed based on the outputs of a sample of respondents from two countries. Cooperation in working on the IP4MaaS project was also provided by a partner from Slovakia (UNIZA) and the Czech Republic (OLTIS). Mathematical statistical tools were used to evaluate the available data to find a connection with promoting mobility as a service. This paper aims to identify differences in travelers’ needs with a focus on using applications using two-step cluster analysis. The research resulted in the identification of differences in traffic behavior within MaaS activities when comparing different clusters reflecting preferences for using a website or mobile application. Full article
(This article belongs to the Special Issue Big Data Mining and Analytics with Applications)
Show Figures

Figure 1

15 pages, 837 KiB  
Article
Column-Type Prediction for Web Tables Powered by Knowledge Base and Text
by Junyi Wu, Chen Ye, Haoshi Zhi and Shihao Jiang
Mathematics 2023, 11(3), 560; https://doi.org/10.3390/math11030560 - 20 Jan 2023
Viewed by 1162
Abstract
Web tables are essential for applications such as data analysis. However, web tables are often incomplete and short of some critical information, which makes it challenging to understand the web table content. Automatically predicting column types for tables without metadata is significant for [...] Read more.
Web tables are essential for applications such as data analysis. However, web tables are often incomplete and short of some critical information, which makes it challenging to understand the web table content. Automatically predicting column types for tables without metadata is significant for dealing with various tables from the Internet. This paper proposes a CNN-Text method to deal with this task, which fuses CNN prediction and voting processes. We present data augmentation and synthetic column generation approaches to improve the CNN’s performance and use extracted text to get better predictions. The experimental result shows that CNN-Text outperforms the baseline methods, demonstrating that CNN-Text is well qualified for the table column type prediction. Full article
(This article belongs to the Special Issue Big Data Mining and Analytics with Applications)
Show Figures

Figure 1

15 pages, 3872 KiB  
Article
A Modified γ-Sutte Indicator for Air Quality Index Prediction
by Dong-Her Shih, To Thi Hien, Ly Sy Phu Nguyen, Ting-Wei Wu and Yen-Ting Lai
Mathematics 2022, 10(17), 3060; https://doi.org/10.3390/math10173060 - 25 Aug 2022
Cited by 1 | Viewed by 1315
Abstract
Air pollution has become an essential issue in environmental protection. The Air Quality Index (AQI) is often used to determine the severity of air pollution. When the AQI reaches the red level, the proportion of asthma patients seeking medical treatment will increase by [...] Read more.
Air pollution has become an essential issue in environmental protection. The Air Quality Index (AQI) is often used to determine the severity of air pollution. When the AQI reaches the red level, the proportion of asthma patients seeking medical treatment will increase by 30% more than usual. If the AQI can be predicted in advance, the benefits of early warning can be achieved. In recent years, a scholar has proposed an α-Sutte indicator which shows its excellence in time series prediction. However, the calculation of α-Sutte indicators uses a fixed weight. Thus, a β-Sutte indicator, using a dynamic weight with a high computation cost, has appeared. However, the computational complexity and sliding window required of the β-Sutte indicator are still high compared to the α-Sutte indicator. In this study, a modified γ-Sutte indicator, using a dynamic weight with a lower computational cost than the β-Sutte indicator, is proposed. In order to prove that the proposed γ-Sutte indicator has good generalization ability and is transferable, this study uses data from different regions and periods to predict the AQI. The results showed that the prediction accuracy of the γ-Sutte indicator proposed was better than other methods. Full article
(This article belongs to the Special Issue Big Data Mining and Analytics with Applications)
Show Figures

Figure 1

17 pages, 934 KiB  
Article
A Two-Stage Hybrid Extreme Learning Model for Short-Term Traffic Flow Forecasting
by Zhihan Cui, Boyu Huang, Haowen Dou, Yan Cheng, Jitian Guan and Teng Zhou
Mathematics 2022, 10(12), 2087; https://doi.org/10.3390/math10122087 - 16 Jun 2022
Cited by 10 | Viewed by 1389
Abstract
Credible and accurate traffic flow forecasting is critical for deploying intelligent traffic management systems. Nevertheless, it remains challenging to develop a robust and efficient forecasting model due to the nonlinear characteristics and inherent stochastic traffic flow. Aiming at the nonlinear relationship in the [...] Read more.
Credible and accurate traffic flow forecasting is critical for deploying intelligent traffic management systems. Nevertheless, it remains challenging to develop a robust and efficient forecasting model due to the nonlinear characteristics and inherent stochastic traffic flow. Aiming at the nonlinear relationship in the traffic flow for different scenarios, we proposed a two-stage hybrid extreme learning model for short-term traffic flow forecasting. In the first stage, the particle swarm optimization algorithm is employed for determining the initial population distribution of the gravitational search algorithm to improve the efficiency of the global optimal value search. In the second stage, the results of the previous stage, rather than the network structure parameters randomly generated by the extreme learning machine, are used to train the hybrid forecasting model in a data-driven fashion. We evaluated the trained model on four real-world benchmark datasets from highways A1, A2, A4, and A8 connecting the Amsterdam ring road. The RMSEs of the proposed model are 288.03, 204.09, 220.52, and 163.92, respectively, and the MAPEs of the proposed model are 11.53%, 10.16%, 11.67%, and 12.02%, respectively. Experimental results demonstrate the superior performance of our proposed model. Full article
(This article belongs to the Special Issue Big Data Mining and Analytics with Applications)
Show Figures

Figure 1

Review

Jump to: Research

22 pages, 361 KiB  
Review
Multi-Source Data Repairing: A Comprehensive Survey
by Chen Ye, Haoyang Duan, Hengtong Zhang, Hua Zhang, Hongzhi Wang and Guojun Dai
Mathematics 2023, 11(10), 2314; https://doi.org/10.3390/math11102314 - 16 May 2023
Cited by 1 | Viewed by 1053
Abstract
In the era of Big Data, integrating information from multiple sources has proven valuable in various fields. To ensure a high-quality supply of multi-source data, repairing different types of errors in the multi-source data becomes critical. This paper categorizes errors in multi-source data [...] Read more.
In the era of Big Data, integrating information from multiple sources has proven valuable in various fields. To ensure a high-quality supply of multi-source data, repairing different types of errors in the multi-source data becomes critical. This paper categorizes errors in multi-source data into entity information overlapping, attribute value conflicts, and attribute value inconsistencies. We first summarize existing repairing methods for these errors and then examine and review the study of the detection and repair of compound-type errors in multi-source data. Finally, we indicate further research directions in multi-source data repair. Full article
(This article belongs to the Special Issue Big Data Mining and Analytics with Applications)
Show Figures

Figure 1

Back to TopTop