Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (13)

Search Parameters:
Keywords = small tabular dataset

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
12 pages, 917 KB  
Article
Time Sequence Deep Learning Model for Ubiquitous Tabular Data with Unique 3D Tensors Manipulation
by Adaleta Gicic, Dženana Đonko and Abdulhamit Subasi
Entropy 2024, 26(9), 783; https://doi.org/10.3390/e26090783 - 12 Sep 2024
Cited by 2 | Viewed by 1792
Abstract
Although deep learning (DL) algorithms have been proved to be effective in diverse research domains, their application in developing models for tabular data remains limited. Models trained on tabular data demonstrate higher efficacy using traditional machine learning models than DL models, which are [...] Read more.
Although deep learning (DL) algorithms have been proved to be effective in diverse research domains, their application in developing models for tabular data remains limited. Models trained on tabular data demonstrate higher efficacy using traditional machine learning models than DL models, which are largely attributed to the size and structure of tabular datasets and the specific application contexts in which they are utilized. Thus, the primary objective of this paper is to propose a method to use the supremacy of Stacked Bidirectional LSTM (Long Short-Term Memory) deep learning algorithms in pattern discovery incorporating tabular data with customized 3D tensor modeling in feeding neural networks. Our findings are empirically validated using six diverse, publicly available datasets each varying in size and learning objectives. This paper proves that the proposed model based on time-sequence DL algorithms, which were generally described as inadequate when dealing with tabular data, yields satisfactory results and competes effectively with other algorithms specifically designed for tabular data. An additional benefit of this approach is its ability to preserve simplicity while ensuring fast model training also with large datasets. Even with extremely small datasets, models can be applied to achieve exceptional predictive results and fully utilize their capacity. Full article
(This article belongs to the Section Multidisciplinary Applications)
Show Figures

Figure 1

23 pages, 5443 KB  
Article
An Intelligent Financial Fraud Detection Support System Based on Three-Level Relationship Penetration
by Xiang Li, Lei Chu, Yujun Li, Zhanjun Xing, Fengqian Ding, Jintao Li and Ben Ma
Mathematics 2024, 12(14), 2195; https://doi.org/10.3390/math12142195 - 12 Jul 2024
Cited by 2 | Viewed by 2349
Abstract
Financial fraud is a serious challenge in a rapidly evolving digital economy that places increasing demands on detection systems. However, traditional methods are often limited by the dimensional information of the corporations themselves and are insufficient to deal with the complexity and dynamics [...] Read more.
Financial fraud is a serious challenge in a rapidly evolving digital economy that places increasing demands on detection systems. However, traditional methods are often limited by the dimensional information of the corporations themselves and are insufficient to deal with the complexity and dynamics of modern financial fraud. This study introduces a novel intelligent financial fraud detection support system, leveraging a three-level relationship penetration (3-LRP) method to decode complex fraudulent networks and enhance prediction accuracy, by integrating the fuzzy rough density-based feature selection (FRDFS) methodology, which optimizes feature screening in noisy financial environments, together with the fuzzy deterministic soft voting (FDSV) method that combines transformer-based deep tabular networks with conventional machine learning classifiers. The integration of FRDFS optimizes feature selection, significantly improving the system’s reliability and performance. An empirical analysis, using a real financial dataset from Chinese small and medium-sized enterprises (SMEs), demonstrates the effectiveness of our proposed method. This research enriches the financial fraud detection literature and provides practical insights for risk management professionals, introducing a comprehensive framework for early warning and proactive risk management in digital finance. Full article
Show Figures

Figure 1

25 pages, 3447 KB  
Article
Optimizing Rare Disease Gait Classification through Data Balancing and Generative AI: Insights from Hereditary Cerebellar Ataxia
by Dante Trabassi, Stefano Filippo Castiglia, Fabiano Bini, Franco Marinozzi, Arash Ajoudani, Marta Lorenzini, Giorgia Chini, Tiwana Varrecchia, Alberto Ranavolo, Roberto De Icco, Carlo Casali and Mariano Serrao
Sensors 2024, 24(11), 3613; https://doi.org/10.3390/s24113613 - 3 Jun 2024
Cited by 25 | Viewed by 2803
Abstract
The interpretability of gait analysis studies in people with rare diseases, such as those with primary hereditary cerebellar ataxia (pwCA), is frequently limited by the small sample sizes and unbalanced datasets. The purpose of this study was to assess the effectiveness of data [...] Read more.
The interpretability of gait analysis studies in people with rare diseases, such as those with primary hereditary cerebellar ataxia (pwCA), is frequently limited by the small sample sizes and unbalanced datasets. The purpose of this study was to assess the effectiveness of data balancing and generative artificial intelligence (AI) algorithms in generating synthetic data reflecting the actual gait abnormalities of pwCA. Gait data of 30 pwCA (age: 51.6 ± 12.2 years; 13 females, 17 males) and 100 healthy subjects (age: 57.1 ± 10.4; 60 females, 40 males) were collected at the lumbar level with an inertial measurement unit. Subsampling, oversampling, synthetic minority oversampling, generative adversarial networks, and conditional tabular generative adversarial networks (ctGAN) were applied to generate datasets to be input to a random forest classifier. Consistency and explainability metrics were also calculated to assess the coherence of the generated dataset with known gait abnormalities of pwCA. ctGAN significantly improved the classification performance compared with the original dataset and traditional data augmentation methods. ctGAN are effective methods for balancing tabular datasets from populations with rare diseases, owing to their ability to improve diagnostic models with consistent explainability. Full article
(This article belongs to the Special Issue Feature Papers in Wearables 2024)
Show Figures

Figure 1

30 pages, 7204 KB  
Article
COVID-19 Hierarchical Classification Using a Deep Learning Multi-Modal
by Albatoul S. Althenayan, Shada A. AlSalamah, Sherin Aly, Thamer Nouh, Bassam Mahboub, Laila Salameh, Metab Alkubeyyer and Abdulrahman Mirza
Sensors 2024, 24(8), 2641; https://doi.org/10.3390/s24082641 - 20 Apr 2024
Cited by 8 | Viewed by 2941
Abstract
Coronavirus disease 2019 (COVID-19), originating in China, has rapidly spread worldwide. Physicians must examine infected patients and make timely decisions to isolate them. However, completing these processes is difficult due to limited time and availability of expert radiologists, as well as limitations of [...] Read more.
Coronavirus disease 2019 (COVID-19), originating in China, has rapidly spread worldwide. Physicians must examine infected patients and make timely decisions to isolate them. However, completing these processes is difficult due to limited time and availability of expert radiologists, as well as limitations of the reverse-transcription polymerase chain reaction (RT-PCR) method. Deep learning, a sophisticated machine learning technique, leverages radiological imaging modalities for disease diagnosis and image classification tasks. Previous research on COVID-19 classification has encountered several limitations, including binary classification methods, single-feature modalities, small public datasets, and reliance on CT diagnostic processes. Additionally, studies have often utilized a flat structure, disregarding the hierarchical structure of pneumonia classification. This study aims to overcome these limitations by identifying pneumonia caused by COVID-19, distinguishing it from other types of pneumonia and healthy lungs using chest X-ray (CXR) images and related tabular medical data, and demonstrate the value of incorporating tabular medical data in achieving more accurate diagnoses. Resnet-based and VGG-based pre-trained convolutional neural network (CNN) models were employed to extract features, which were then combined using early fusion for the classification of eight distinct classes. We leveraged the hierarchal structure of pneumonia classification within our approach to achieve improved classification outcomes. Since an imbalanced dataset is common in this field, a variety of versions of generative adversarial networks (GANs) were used to generate synthetic data. The proposed approach tested in our private datasets of 4523 patients achieved a macro-avg F1-score of 95.9% and an F1-score of 87.5% for COVID-19 identification using a Resnet-based structure. In conclusion, in this study, we were able to create an accurate deep learning multi-modal to diagnose COVID-19 and differentiate it from other kinds of pneumonia and normal lungs, which will enhance the radiological diagnostic process. Full article
(This article belongs to the Special Issue Advanced Deep Learning for Biomedical Sensing and Imaging)
Show Figures

Figure 1

18 pages, 12435 KB  
Article
Fault Diagnosis Method of Box-Type Substation Based on Improved Conditional Tabular Generative Adversarial Network and AlexNet
by Yong Liu, Jialin Zhou, Dong Zhang, Shaoyu Wei, Mingshun Yang and Xinqin Gao
Appl. Sci. 2024, 14(7), 3112; https://doi.org/10.3390/app14073112 - 8 Apr 2024
Cited by 6 | Viewed by 1274
Abstract
To solve the problem of low diagnostic accuracy caused by the scarcity of fault samples and class imbalance in the fault diagnosis task of box-type substations, a fault diagnosis method based on self-attention improvement of conditional tabular generative adversarial network (CTGAN) and AlexNet [...] Read more.
To solve the problem of low diagnostic accuracy caused by the scarcity of fault samples and class imbalance in the fault diagnosis task of box-type substations, a fault diagnosis method based on self-attention improvement of conditional tabular generative adversarial network (CTGAN) and AlexNet was proposed. The self-attention mechanism is introduced into the generator of CTGAN to maintain the correlation between the indicators of the input data, and a large amounts of high-quality data are generated according to the small number of fault samples. The generated data are input into the AlexNet model for fault diagnosis. The experimental results demonstrate that compared with the SMOTE and CTGAN methods, the dataset generated by the self-attention-conditional tabular generative adversarial network (SA-CTGAN) model has better data relevance. The accuracy of fault diagnosis by the proposed method reaches 94.81%, which is improved by about 11% compared with the model trained on the original data. Full article
(This article belongs to the Section Applied Physics General)
Show Figures

Figure 1

24 pages, 2626 KB  
Article
An Interpretable Machine Learning Framework for Rare Disease: A Case Study to Stratify Infection Risk in Pediatric Leukemia
by Irfan Al-Hussaini, Brandon White, Armon Varmeziar, Nidhi Mehra, Milagro Sanchez, Judy Lee, Nicholas P. DeGroote, Tamara P. Miller and Cassie S. Mitchell
J. Clin. Med. 2024, 13(6), 1788; https://doi.org/10.3390/jcm13061788 - 20 Mar 2024
Cited by 6 | Viewed by 2704
Abstract
Background: Datasets on rare diseases, like pediatric acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), have small sample sizes that hinder machine learning (ML). The objective was to develop an interpretable ML framework to elucidate actionable insights from small tabular rare [...] Read more.
Background: Datasets on rare diseases, like pediatric acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), have small sample sizes that hinder machine learning (ML). The objective was to develop an interpretable ML framework to elucidate actionable insights from small tabular rare disease datasets. Methods: The comprehensive framework employed optimized data imputation and sampling, supervised and unsupervised learning, and literature-based discovery (LBD). The framework was deployed to assess treatment-related infection in pediatric AML and ALL. Results: An interpretable decision tree classified the risk of infection as either “high risk” or “low risk” in pediatric ALL (n = 580) and AML (n = 132) with accuracy of ∼79%. Interpretable regression models predicted the discrete number of developed infections with a mean absolute error (MAE) of 2.26 for bacterial infections and an MAE of 1.29 for viral infections. Features that best explained the development of infection were the chemotherapy regimen, cancer cells in the central nervous system at initial diagnosis, chemotherapy course, leukemia type, Down syndrome, race, and National Cancer Institute risk classification. Finally, SemNet 2.0, an open-source LBD software that links relationships from 33+ million PubMed articles, identified additional features for the prediction of infection, like glucose, iron, neutropenia-reducing growth factors, and systemic lupus erythematosus (SLE). Conclusions: The developed ML framework enabled state-of-the-art, interpretable predictions using rare disease tabular datasets. ML model performance baselines were successfully produced to predict infection in pediatric AML and ALL. Full article
(This article belongs to the Special Issue Advances in Pediatric Leukemia)
Show Figures

Figure 1

29 pages, 6070 KB  
Article
A Multi-Agent Intrusion Detection System Optimized by a Deep Reinforcement Learning Approach with a Dataset Enlarged Using a Generative Model to Reduce the Bias Effect
by Matthieu Mouyart, Guilherme Medeiros Machado and Jae-Yun Jun
J. Sens. Actuator Netw. 2023, 12(5), 68; https://doi.org/10.3390/jsan12050068 - 18 Sep 2023
Cited by 8 | Viewed by 4008
Abstract
Intrusion detection systems can defectively perform when they are adjusted with datasets that are unbalanced in terms of attack data and non-attack data. Most datasets contain more non-attack data than attack data, and this circumstance can introduce biases in intrusion detection systems, making [...] Read more.
Intrusion detection systems can defectively perform when they are adjusted with datasets that are unbalanced in terms of attack data and non-attack data. Most datasets contain more non-attack data than attack data, and this circumstance can introduce biases in intrusion detection systems, making them vulnerable to cyberattacks. As an approach to remedy this issue, we considered the Conditional Tabular Generative Adversarial Network (CTGAN), with its hyperparameters optimized using the tree-structured Parzen estimator (TPE), to balance an insider threat tabular dataset called the CMU-CERT, which is formed by discrete-value and continuous-value columns. We showed through this method that the mean absolute errors between the probability mass functions (PMFs) of the actual data and the PMFs of the data generated using the CTGAN can be relatively small. Then, from the optimized CTGAN, we generated synthetic insider threat data and combined them with the actual ones to balance the original dataset. We used the resulting dataset for an intrusion detection system implemented with the Adversarial Environment Reinforcement Learning (AE-RL) algorithm in a multi-agent framework formed by an attacker and a defender. We showed that the performance of detecting intrusions using the framework of the CTGAN and the AE-RL is significantly improved with respect to the case where the dataset is not balanced, giving an F1-score of 0.7617. Full article
(This article belongs to the Special Issue Machine-Environment Interaction, Volume II)
Show Figures

Figure 1

20 pages, 5675 KB  
Article
Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP
by Winston Wang and Tun-Wen Pai
Data 2023, 8(9), 135; https://doi.org/10.3390/data8090135 - 23 Aug 2023
Cited by 17 | Viewed by 5717
Abstract
This study addressed the challenge of training generative adversarial networks (GANs) on small tabular clinical trial datasets for data augmentation, which are known to pose difficulties in training due to limited sample sizes. To overcome this obstacle, a hybrid approach is proposed, combining [...] Read more.
This study addressed the challenge of training generative adversarial networks (GANs) on small tabular clinical trial datasets for data augmentation, which are known to pose difficulties in training due to limited sample sizes. To overcome this obstacle, a hybrid approach is proposed, combining the synthetic minority oversampling technique (SMOTE) to initially augment the original data to a more substantial size for improving the subsequent GAN training with a Wasserstein conditional generative adversarial network with gradient penalty (WCGAN-GP), proven for its state-of-art performance and enhanced stability. The ultimate objective of this research was to demonstrate that the quality of synthetic tabular data generated by the final WCGAN-GP model maintains the structural integrity and statistical representation of the original small dataset using this hybrid approach. This focus is particularly relevant for clinical trials, where limited data availability due to privacy concerns and restricted accessibility to subject enrollment pose common challenges. Despite the limitation of data, the findings demonstrate that the hybrid approach successfully generates synthetic data that closely preserved the characteristics of the original small dataset. By harnessing the power of this hybrid approach to generate faithful synthetic data, the potential for enhancing data-driven research in drug clinical trials become evident. This includes enabling a robust analysis on small datasets, supplementing the lack of clinical trial data, facilitating its utility in machine learning tasks, even extending to using the model for anomaly detection to ensure better quality control during clinical trial data collection, all while prioritizing data privacy and implementing strict data protection measures. Full article
(This article belongs to the Topic Machine Learning Techniques Driven Medicine Analysis)
Show Figures

Figure 1

16 pages, 553 KB  
Article
Multiple Instance Learning with Trainable Soft Decision Tree Ensembles
by Andrei Konstantinov, Lev Utkin and Vladimir Muliukha
Algorithms 2023, 16(8), 358; https://doi.org/10.3390/a16080358 - 26 Jul 2023
Cited by 3 | Viewed by 2389
Abstract
A new random forest-based model for solving the Multiple Instance Learning problem under small tabular data, called the Soft Tree Ensemble Multiple Instance Learning, is proposed. A new type of soft decision trees is considered, which is similar to the well-known soft oblique [...] Read more.
A new random forest-based model for solving the Multiple Instance Learning problem under small tabular data, called the Soft Tree Ensemble Multiple Instance Learning, is proposed. A new type of soft decision trees is considered, which is similar to the well-known soft oblique trees, but with a smaller number of trainable parameters. In order to train the trees, it is proposed to convert them into neural networks of a specific form, which approximate the tree functions. It is also proposed to aggregate the instance and bag embeddings (output vectors) by using the attention mechanism. The whole Soft Tree Ensemble Multiple Instance Learning model, including soft decision trees, neural networks, the attention mechanism and a classifier, is trained in an end-to-end manner. Numerical experiments with well-known real tabular datasets show that the proposed model can outperform many existing multiple instance learning models. A code implementing the model is publicly available. Full article
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)
Show Figures

Figure 1

19 pages, 4955 KB  
Article
Qualitative and Quantitative Evaluation of Multivariate Time-Series Synthetic Data Generated Using MTS-TGAN: A Novel Approach
by Parul Yadav, Manish Gaur, Nishat Fatima and Saqib Sarwar
Appl. Sci. 2023, 13(7), 4136; https://doi.org/10.3390/app13074136 - 24 Mar 2023
Cited by 12 | Viewed by 4163
Abstract
To obtain high performance, generalization, and accuracy in machine learning applications, such as prediction or anomaly detection, large datasets are a necessary prerequisite. Moreover, the collection of data is time-consuming, difficult, and expensive for many imbalanced or small datasets. These challenges are evident [...] Read more.
To obtain high performance, generalization, and accuracy in machine learning applications, such as prediction or anomaly detection, large datasets are a necessary prerequisite. Moreover, the collection of data is time-consuming, difficult, and expensive for many imbalanced or small datasets. These challenges are evident in collecting data for financial and banking services, pharmaceuticals and healthcare, manufacturing and the automobile, robotics car, sensor time-series data, and many more. To overcome the challenges of data collection, researchers in many domains are becoming more and more interested in the development or generation of synthetic data. Generating synthetic time-series data is far more complicated and expensive than generating synthetic tabular data. The primary objective of the paper is to generate multivariate time-series data (for continuous and mixed parameters) that are comparable and evaluated with real multivariate time-series synthetic data. After being trained to produce such data, a novel GAN architecture named as MTS-TGAN is proposed and then assessed using both qualitative measures namely t-SNE, PCA, discriminative and predictive scores as well as quantitative measures, for which an RNN model is implemented, which calculates MAE and MSLE scores for three training phases; Train Real Test Real, Train Real Test Synthetic and Train Synthetic Test Real. The model is able to reduce the overall error up to 13% and 10% in predictive and discriminative scores, respectively. The research’s objectives are met, and the outcomes demonstrate that MTS-TGAN is able to pick up on the distribution and underlying knowledge included in the attributes of the real data and it can serve as a starting point for additional research in the respective area. Full article
(This article belongs to the Special Issue Big Data Security and Privacy in Internet of Things)
Show Figures

Figure 1

17 pages, 718 KB  
Article
CTTGAN: Traffic Data Synthesizing Scheme Based on Conditional GAN
by Jiayu Wang, Xuehu Yan, Lintao Liu, Longlong Li and Yongqiang Yu
Sensors 2022, 22(14), 5243; https://doi.org/10.3390/s22145243 - 13 Jul 2022
Cited by 20 | Viewed by 4231
Abstract
Most machine learning algorithms only have a good recognition rate on balanced datasets. However, in the field of malicious traffic identification, benign traffic on the network is far greater than malicious traffic, and the network traffic dataset is imbalanced, which makes the algorithm [...] Read more.
Most machine learning algorithms only have a good recognition rate on balanced datasets. However, in the field of malicious traffic identification, benign traffic on the network is far greater than malicious traffic, and the network traffic dataset is imbalanced, which makes the algorithm have a low identification rate for small categories of malicious traffic samples. This paper presents a traffic sample synthesizing model named Conditional Tabular Traffic Generative Adversarial Network (CTTGAN), which uses a Conditional Tabular Generative Adversarial Network (CTGAN) algorithm to expand the small category traffic samples and balance the dataset in order to improve the malicious traffic identification rate. The CTTGAN model expands and recognizes feature data, which meets the requirements of a machine learning algorithm for training and prediction data. The contributions of this paper are as follows: first, the small category samples are expanded and the traffic dataset is balanced; second, the storage cost and computational complexity are reduced compared to models using image data; third, discrete variables and continuous variables in traffic feature data are processed at the same time, and the data distribution is described well. The experimental results show that the recognition rate of the expanded samples is more than 0.99 in MLP, KNN and SVM algorithms. In addition, the recognition rate of the proposed CTTGAN model is better than the oversampling and undersampling schemes. Full article
(This article belongs to the Special Issue Cyber Security and AI)
Show Figures

Figure 1

26 pages, 80832 KB  
Article
Topographic Analysis of Intertidal Polychaete Reefs (Sabellaria alveolata) at a Very High Spatial Resolution
by Guillaume Brunier, Simon Oiry, Yves Gruet, Stanislas F. Dubois and Laurent Barillé
Remote Sens. 2022, 14(2), 307; https://doi.org/10.3390/rs14020307 - 10 Jan 2022
Cited by 22 | Viewed by 4556
Abstract
In temperate coastal regions of Western Europe, the polychaete Sabellaria alveolata (Linné) builds large intertidal reefs of several hectares on soft-bottom substrates. These reefs are protected by the European Habitat Directive EEC/92/43 under the status of biogenic structures hosting a high biodiversity and [...] Read more.
In temperate coastal regions of Western Europe, the polychaete Sabellaria alveolata (Linné) builds large intertidal reefs of several hectares on soft-bottom substrates. These reefs are protected by the European Habitat Directive EEC/92/43 under the status of biogenic structures hosting a high biodiversity and providing ecological functions such as protection against coastal erosion. As an alternative to time-consuming field campaigns, a UAV-based Structure-from-Motion photogrammetric survey was carried out in October 2020 over Noirmoutier Island (France) where the second-largest known European reef is located in a tidal delta. A DJI Phantom 4 Multispectral UAV provided a topographic dataset at very high resolutions of 5 cm/pixel for the Digital Surface Model (DSM) and 2.63 cm/pixel for the multispectral orthomosaic images. The reef footprint was mapped using a combination of two topographic indices: the Topographic Openness Index and the Topographic Position Index. The reef structures covered an area of 8.15 ha, with 89% corresponding to the main reef composed of connected and continuous biogenic structures, 7.6% of large isolated structures (<60 m2), and 4.4% of small isolated reef clumps (<2 m2). To further describe the topographic complexity of the reef, the Geomorphon landform classification was used. The spatial distribution of tabular platforms considered as a healthy stage of the reef in contrast to a degraded stage was mapped with a proxy that consists in comparing the reef volume to a theoretical tabular-shaped reef volume. Epibionts colonizing the reef (macroalgae, mussels, and oysters) were also mapped by combining multispectral indices such as the Normalised Difference Vegetation Index and simple band ratios with topographic indices. A confusion matrix showed that macroalgae and mussels were satisfactorily identified but that oysters could not be detected by an automated procedure due to their spectral complexity. The topographic indices used in this work should now be further exploited to propose a health index for these large intertidal reefs. Full article
(This article belongs to the Section Ecological Remote Sensing)
Show Figures

Figure 1

10 pages, 274 KB  
Article
Study on the Impact of Institutions on the Labor Productivity of Private Enterprises in Vietnam through the Spillover Effect from State-Owned Enterprises
by Hong-Nham Nguyen Thi, Hong-Thuy Le Thi and The-Dong Phung
Economies 2021, 9(3), 122; https://doi.org/10.3390/economies9030122 - 28 Aug 2021
Cited by 1 | Viewed by 3327
Abstract
The paper analyzes the impact of institutions on the labor productivity of small and medium-sized private enterprises through the spillover effect from state-owned enterprises (SOEs). The authors used data samples from three datasets: (i) The Annual Enterprise Survey conducted by the General Statistics [...] Read more.
The paper analyzes the impact of institutions on the labor productivity of small and medium-sized private enterprises through the spillover effect from state-owned enterprises (SOEs). The authors used data samples from three datasets: (i) The Annual Enterprise Survey conducted by the General Statistics Office of Vietnam (GSO) from 2010 to 2018; (ii) Institutional data (PCI) published by the Vietnam Chamber of Commerce and Industry (VCCI) from 2010 to 2018; (iii) GSO 2012 I-O balance sheet and a set of tabular data containing 666,221 observations at the enterprise and provincial levels in Vietnam from 2010 to 2018, including both listed and unlisted enterprises. The model’s experimental result shows that institutional improvement boosts labor productivity of domestic private enterprises through a horizontal and forward spillover channel from SOEs. Through the backward spillover channel from SOEs, how institutional improvement affects the labor productivity depends on the degree of backward spillover channel from SOEs. Full article
(This article belongs to the Special Issue Nexus between Politics and Economics in the Emerging Countries)
Back to TopTop