Next Issue
Volume 5, September
Previous Issue
Volume 5, March

Big Data Cogn. Comput., Volume 5, Issue 2 (June 2021) – 12 articles

Cover Story (view full-size image): The road to fully Trustworthy Artificial Intelligence (AI) includes the need to be able to track and capture all the aspects related to the production of an AI model, including not only the provenance of data, processes and artifacts, but also the researcher's judgement and process of thought in their decision making. Here, we review the existing tools and data models for traceability, finding a number of them to be with different capabilities and levels of maturity but also lacking a common approach to the task. Finally, we also propose a set of minimal requirements to consider a model traceable according to the High-Level Expert Group on AI guidelines, showing a number of directions in which there is a need to carry out further work towards completely traceable decisions made based on models. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Readerexternal link to open them.
Order results
Result details
Select all
Export citation of selected articles as:
Article
Near-Real-Time IDS for the U.S. FAA’s NextGen ADS-B
Big Data Cogn. Comput. 2021, 5(2), 27; https://doi.org/10.3390/bdcc5020027 - 16 Jun 2021
Cited by 1 | Viewed by 2795
Abstract
Modern-day aircraft are flying computer networks, vulnerable to ground station flooding, ghost aircraft injection or flooding, aircraft disappearance, virtual trajectory modifications or false alarm attacks, and aircraft spoofing. This work lays out a data mining process, in the context of big data, to [...] Read more.
Modern-day aircraft are flying computer networks, vulnerable to ground station flooding, ghost aircraft injection or flooding, aircraft disappearance, virtual trajectory modifications or false alarm attacks, and aircraft spoofing. This work lays out a data mining process, in the context of big data, to determine flight patterns, including patterns for possible attacks, in the U.S. National Air Space (NAS). Flights outside the flight patterns are possible attacks. For this study, OpenSky was used as the data source of Automatic Dependent Surveillance-Broadcast (ADS-B) messages, NiFi was used for data management, Elasticsearch was used as the log analyzer, Kibana was used to visualize the data for feature selection, and Support Vector Machine (SVM) was used for classification. This research provides a solution for attack mitigation by packaging a machine learning algorithm, SVM, into an intrusion detection system and calculating the feasibility of processing US ADS-B messages in near real time. Results of this work show that ADS-B network attacks can be detected using network attack signatures, and volume and velocity calculations show that ADS-B messages are processable at the scale of the U.S. Next Generation (NextGen) Air Traffic Systems using commodity hardware, facilitating real time attack detection. Precision and recall close to 80% were obtained using SVM. Full article
Show Figures

Figure 1

Article
Which Way to Cope with COVID-19 Challenges? Contributions of the IoT for Smart City Projects
Big Data Cogn. Comput. 2021, 5(2), 26; https://doi.org/10.3390/bdcc5020026 - 16 Jun 2021
Viewed by 2968
Abstract
Many activities and sectors have come to a halt due to the COVID-19 crisis. People and workers’ habits and behaviors have changed dramatically, as the use of technologies and connections, virtual reality, and remote support have been enhanced. Businesses and cities have been [...] Read more.
Many activities and sectors have come to a halt due to the COVID-19 crisis. People and workers’ habits and behaviors have changed dramatically, as the use of technologies and connections, virtual reality, and remote support have been enhanced. Businesses and cities have been forced to quickly adapt to the new challenges. Digital technologies have allowed people to have better access to public services due to improved use of resources. Smart cities have significant potential for linking people to work and services as never done before. Additionally, the technological convergence produces data that can enhance interactions and decisions toward the “new normal”. In this paper, the aim is to assess how Portugal is prepared to respond to the accelerated process that this context demands from cities. Portuguese SMEs have developed a good capacity for entrepreneurship and innovation; however, they are still behind in converting the knowledge acquired in sales and exports and there is still limited collaboration at the public-private level. The acceleration of smart cities through the Internet of Things (IoT) may encourage changes in these issues. A more assertive alignment between the emergent technologies and the digitization goals of companies is required. This paper opens a discussion around major needs and trends of IoT (and related technologies) since the pandemic has leveraged them. The relationship between innovation and city smartness is approached to assess main contributing and limiting variables (through the European Innovation Scoreboard), to clarify future directions toward smarter services. The tourism sector, as the largest export economic activity in the country, is addressed in this matter. An analytical framework (using, for example, Power BI and Azure IoT Hub) around this approach can choose and support the most suitable areas of development in the country. Full article
(This article belongs to the Special Issue Internet of Things (IoT) and Ambient Intelligence)
Show Figures

Figure 1

Article
Structural Differences of the Semantic Network in Adolescents with Intellectual Disability
Big Data Cogn. Comput. 2021, 5(2), 25; https://doi.org/10.3390/bdcc5020025 - 01 Jun 2021
Viewed by 3627
Abstract
The semantic network structure is a core aspect of the mental lexicon and is, therefore, a key to understanding language development processes. This study investigated the structure of the semantic network of adolescents with intellectual disability (ID) and children with typical development (TD) [...] Read more.
The semantic network structure is a core aspect of the mental lexicon and is, therefore, a key to understanding language development processes. This study investigated the structure of the semantic network of adolescents with intellectual disability (ID) and children with typical development (TD) using network analysis. The semantic networks of the participants (nID = 66; nTD = 49) were estimated from the semantic verbal fluency task with the pathfinder method. The groups were matched on the number of produced words. The average shortest path length (ASPL), the clustering coefficient (CC), and the network’s modularity (Q) of the two groups were compared. A significantly smaller ASPL and Q and a significantly higher CC were found for the adolescents with ID in comparison with the children with TD. Reasons for this might be differences in the language environment and differences in cognitive skills. The quality and quantity of the language input might differ for adolescents with ID due to differences in school curricula and because persons with ID tend to engage in different out-of-school activities compared to TD peers. Future studies should investigate the influence of different language environments on the language development of persons with ID. Full article
(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)
Show Figures

Figure 1

Article
Without Data Quality, There Is No Data Migration
Big Data Cogn. Comput. 2021, 5(2), 24; https://doi.org/10.3390/bdcc5020024 - 18 May 2021
Viewed by 2707
Abstract
Data migration is required to run data-intensive applications. Legacy data storage systems are not capable of accommodating the changing nature of data. In many companies, data migration projects fail because their importance and complexity are not taken seriously enough. Data migration strategies include [...] Read more.
Data migration is required to run data-intensive applications. Legacy data storage systems are not capable of accommodating the changing nature of data. In many companies, data migration projects fail because their importance and complexity are not taken seriously enough. Data migration strategies include storage migration, database migration, application migration, and business process migration. Regardless of which migration strategy a company chooses, there should always be a stronger focus on data cleansing. On the one hand, complete, correct, and clean data not only reduce the cost, complexity, and risk of the changeover, it also means a good basis for quick and strategic company decisions and is therefore an essential basis for today’s dynamic business processes. Data quality is an important issue for companies looking for data migration these days and should not be overlooked. In order to determine the relationship between data quality and data migration, an empirical study with 25 large German and Swiss companies was carried out to find out the importance of data quality in companies for data migration. In this paper, we present our findings regarding how data quality plays an important role in a data migration plans and must not be ignored. Without acceptable data quality, data migration is impossible. Full article
(This article belongs to the Special Issue Educational Data Mining and Technology)
Show Figures

Figure 1

Article
Assessment of Cybersecurity Awareness among Students of Majmaah University
Big Data Cogn. Comput. 2021, 5(2), 23; https://doi.org/10.3390/bdcc5020023 - 10 May 2021
Cited by 4 | Viewed by 4404
Abstract
Information exchange has become increasingly faster and efficient through the use of recent technological advances, such as instant messaging and social media platforms. Consequently, access to information has become easier. However, new types of cybersecurity threats that typically result in data loss and [...] Read more.
Information exchange has become increasingly faster and efficient through the use of recent technological advances, such as instant messaging and social media platforms. Consequently, access to information has become easier. However, new types of cybersecurity threats that typically result in data loss and information misuse have emerged simultaneously. Therefore, maintaining data privacy in complex systems is important and necessary, particularly in organizations where the vast majority of individuals interacting with these systems is students. In most cases, students engage in data breaches and digital misconduct due to the lack of knowledge and awareness of cybersecurity and the consequences of cybercrime. The aim of this study was to investigate and evaluate the level of cybersecurity awareness and user compliance among undergraduate students at Majmaah University using a scientific questionnaire based on several safety factors for the use of the Internet. We quantitatively evaluated the knowledge of cybercrime and protection among students to show the need for user education, training, and awareness. In this study, we used a quantitative research methodology and conducted different statistical tests, such as ANOVA, Kaiser–Meyer–Olkin (KMO), and Bartlett’s tests, to evaluate and analyze the hypotheses. Safety concerns for electronic emails, computer viruses, phishing, forged ads, popup windows, and supplementary outbreaks on the Internet were well-examined in this study. Finally, we present recommendations based on the collected data to deal with this common problem. Full article
(This article belongs to the Special Issue Cybersecurity, Threat Analysis and the Management of Risk)
Show Figures

Figure 1

Article
Estimating Causal Effects When the Treatment Affects All Subjects Simultaneously: An Application
Big Data Cogn. Comput. 2021, 5(2), 22; https://doi.org/10.3390/bdcc5020022 - 06 May 2021
Cited by 1 | Viewed by 2725
Abstract
Several important questions cannot be answered with the standard toolkit of causal inference since all subjects are treated for a given period and thus there is no control group. One example of this type of questions is the impact of carbon dioxide emissions [...] Read more.
Several important questions cannot be answered with the standard toolkit of causal inference since all subjects are treated for a given period and thus there is no control group. One example of this type of questions is the impact of carbon dioxide emissions on global warming. In this paper, we address this question using a machine learning method, which allows estimating causal impacts in settings when a randomized experiment is not feasible. We discuss the conditions under which this method can identify a causal impact, and we find that carbon dioxide emissions are responsible for an increase in average global temperature of about 0.3 degrees Celsius between 1961 and 2011. We offer two main contributions. First, we provide one additional application of Machine Learning to answer causal questions of policy relevance. Second, by applying a methodology that relies on few directly testable assumptions and is easy to replicate, we provide robust evidence of the man-made nature of global warming, which could reduce incentives to turn to biased sources of information that fuels climate change skepticism. Full article
(This article belongs to the Special Issue Big Data Analytics for Social Services)
Show Figures

Figure 1

Article
Big Remote Sensing Image Classification Based on Deep Learning Extraction Features and Distributed Spark Frameworks
Big Data Cogn. Comput. 2021, 5(2), 21; https://doi.org/10.3390/bdcc5020021 - 05 May 2021
Cited by 2 | Viewed by 3148
Abstract
Big data analysis assumes a significant role in Earth observation using remote sensing images, since the explosion of data images from multiple sensors is used in several fields. The traditional data analysis techniques have different limitations on storing and processing massive volumes of [...] Read more.
Big data analysis assumes a significant role in Earth observation using remote sensing images, since the explosion of data images from multiple sensors is used in several fields. The traditional data analysis techniques have different limitations on storing and processing massive volumes of data. Besides, big remote sensing data analytics demand sophisticated algorithms based on specific techniques to store to process the data in real-time or in near real-time with high accuracy, efficiency, and high speed. In this paper, we present a method for storing a huge number of heterogeneous satellite images based on Hadoop distributed file system (HDFS) and Apache Spark. We also present how deep learning algorithms such as VGGNet and UNet can be beneficial to big remote sensing data processing for feature extraction and classification. The obtained results prove that our approach outperforms other methods. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis for Image Processing)
Show Figures

Figure 1

Article
Traceability for Trustworthy AI: A Review of Models and Tools
Big Data Cogn. Comput. 2021, 5(2), 20; https://doi.org/10.3390/bdcc5020020 - 04 May 2021
Cited by 3 | Viewed by 3621
Abstract
Traceability is considered a key requirement for trustworthy artificial intelligence (AI), related to the need to maintain a complete account of the provenance of data, processes, and artifacts involved in the production of an AI model. Traceability in AI shares part of its [...] Read more.
Traceability is considered a key requirement for trustworthy artificial intelligence (AI), related to the need to maintain a complete account of the provenance of data, processes, and artifacts involved in the production of an AI model. Traceability in AI shares part of its scope with general purpose recommendations for provenance as W3C PROV, and it is also supported to different extents by specific tools used by practitioners as part of their efforts in making data analytic processes reproducible or repeatable. Here, we review relevant tools, practices, and data models for traceability in their connection to building AI models and systems. We also propose some minimal requirements to consider a model traceable according to the assessment list of the High-Level Expert Group on AI. Our review shows how, although a good number of reproducibility tools are available, a common approach is currently lacking, together with the need for shared semantics. Besides, we have detected that some tools have either not achieved full maturity, or are already falling into obsolescence or in a state of near abandonment by its developers, which might compromise the reproducibility of the research trusted to them. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: 5th Anniversary Feature Papers)
Show Figures

Figure 1

Article
Deep Neural Network Analysis for Environmental Study of Coral Reefs in the Gulf of Eilat (Aqaba)
Big Data Cogn. Comput. 2021, 5(2), 19; https://doi.org/10.3390/bdcc5020019 - 30 Apr 2021
Viewed by 2749
Abstract
Coral reefs are undergoing a severe decline due to ocean acidification, seawater warming and anthropogenic eutrophication. We demonstrate the applicability of Deep Learning (DL) for following these changes. We examined the distribution and frequency appearance of the eleven most common coral species at [...] Read more.
Coral reefs are undergoing a severe decline due to ocean acidification, seawater warming and anthropogenic eutrophication. We demonstrate the applicability of Deep Learning (DL) for following these changes. We examined the distribution and frequency appearance of the eleven most common coral species at four sites in the Gulf of Eilat. We compared deep learning with conventional census methods. The methods used in this research were natural sampling units via photographing the coral reef, line transects for estimating the cover percentage at the four test sites and deep convolutional neural networks, which proved to be an efficient sparse classification for coral species using the supervised deep learning method. The main research goal was to identify the common coral species at four test sites in the Gulf of Eilat, using DL to detect differences in coral cover and species composition among the sites, and relate these to ecological characteristics, such as depth and anthropogenic disturbance. The use of this method will produce a vital database to follow changes over time in coral reefs, identify trend lines and recommend remediation measures accordingly. We outline future monitoring needs and the corresponding system developments required to meet these. Full article
Show Figures

Figure 1

Article
Deep Automation Bias: How to Tackle a Wicked Problem of AI?
Big Data Cogn. Comput. 2021, 5(2), 18; https://doi.org/10.3390/bdcc5020018 - 20 Apr 2021
Cited by 2 | Viewed by 3797
Abstract
The increasing use of AI in different societal contexts intensified the debate on risks, ethical problems and bias. Accordingly, promising research activities focus on debiasing to strengthen fairness, accountability and transparency in machine learning. There is, though, a tendency to fix societal and [...] Read more.
The increasing use of AI in different societal contexts intensified the debate on risks, ethical problems and bias. Accordingly, promising research activities focus on debiasing to strengthen fairness, accountability and transparency in machine learning. There is, though, a tendency to fix societal and ethical issues with technical solutions that may cause additional, wicked problems. Alternative analytical approaches are thus needed to avoid this and to comprehend how societal and ethical issues occur in AI systems. Despite various forms of bias, ultimately, risks result from eventual rule conflicts between the AI system behavior due to feature complexity and user practices with limited options for scrutiny. Hence, although different forms of bias can occur, automation is their common ground. The paper highlights the role of automation and explains why deep automation bias (DAB) is a metarisk of AI. Based on former work it elaborates the main influencing factors and develops a heuristic model for assessing DAB-related risks in AI systems. This model aims at raising problem awareness and training on the sociotechnical risks resulting from AI-based automation and contributes to improving the general explicability of AI systems beyond technical issues. Full article
Show Figures

Figure 1

Article
GeoLOD: A Spatial Linked Data Catalog and Recommender
Big Data Cogn. Comput. 2021, 5(2), 17; https://doi.org/10.3390/bdcc5020017 - 19 Apr 2021
Cited by 1 | Viewed by 2955
Abstract
The increasing availability of linked data poses new challenges for the identification and retrieval of the most appropriate data sources that meet user needs. Recent dataset catalogs and recommenders provide advanced methods that facilitate linked data search, but none exploits the spatial characteristics [...] Read more.
The increasing availability of linked data poses new challenges for the identification and retrieval of the most appropriate data sources that meet user needs. Recent dataset catalogs and recommenders provide advanced methods that facilitate linked data search, but none exploits the spatial characteristics of datasets. In this paper, we present GeoLOD, a web catalog of spatial datasets and classes and a recommender for spatial datasets and classes possibly relevant for link discovery processes. GeoLOD Catalog parses, maintains and generates metadata about datasets and classes provided by SPARQL endpoints that contain georeferenced point instances. It offers text and map-based search functionality and dataset descriptions in GeoVoID, a spatial dataset metadata template that extends VoID. GeoLOD Recommender pre-computes and maintains, for all identified spatial classes in the Web of Data (WoD), ranked lists of classes relevant for link discovery. In addition, the on-the-fly Recommender allows users to define an uncatalogued SPARQL endpoint, a GeoJSON or a Shapefile and get class recommendations in real time. Furthermore, generated recommendations can be automatically exported in SILK and LIMES configuration files in order to be used for a link discovery task. In the results, we provide statistics about the status and potential connectivity of spatial datasets in the WoD, we assess the applicability of the recommender, and we present the outcome of a system usability study. GeoLOD is the first catalog that targets both linked data experts and geographic information systems professionals, exploits geographical characteristics of datasets and provides an exhaustive list of WoD spatial datasets and classes along with class recommendations for link discovery. Full article
(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)
Show Figures

Figure 1

Article
Wine Ontology Influence in a Recommendation System
Big Data Cogn. Comput. 2021, 5(2), 16; https://doi.org/10.3390/bdcc5020016 - 15 Apr 2021
Viewed by 3107
Abstract
Wine is the second most popular alcoholic drink in the world behind beer. With the rise of e-commerce, recommendation systems have become a very important factor in the success of business. Recommendation systems analyze metadata to predict if, for example, a user will [...] Read more.
Wine is the second most popular alcoholic drink in the world behind beer. With the rise of e-commerce, recommendation systems have become a very important factor in the success of business. Recommendation systems analyze metadata to predict if, for example, a user will recommend a product. The metadata consist mostly of former reviews or web traffic from the same user. For this reason, we investigate what would happen if the information analyzed by a recommendation system was insufficient. In this paper, we explore the effects of a new wine ontology in a recommendation system. We created our own wine ontology and then made two sets of tests for each dataset. In both sets of tests, we applied four machine learning clustering algorithms that had the objective of predicting if a user recommends a wine product. The only difference between each set of tests is the attributes contained in the dataset. In the first set of tests, the datasets were influenced by the ontology, and in the second set, the only information about a wine product is its name. We compared the two test sets’ results and observed that there was a significant increase in classification accuracy when using a dataset with the proposed ontology. We demonstrate the general applicability of the methodology to other cases, applying our proposal to an Amazon product review dataset. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: 5th Anniversary Feature Papers)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop