Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (19)

Search Parameters:
Keywords = formats for storing big data

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 2702 KB  
Article
Spatial Heterogeneity of Intra-Urban E-Commerce Demand and Its Retail-Delivery Interactions: Evidence from Waybill Big Data
by Yunnan Cai, Jiangmin Chen and Shijie Li
J. Theor. Appl. Electron. Commer. Res. 2025, 20(3), 190; https://doi.org/10.3390/jtaer20030190 - 1 Aug 2025
Viewed by 432
Abstract
E-commerce growth has reshaped consumer behavior and retail services, driving parcel demand and challenging last-mile logistics. Existing research predominantly relies on survey data and global regression models that overlook intra-urban spatial heterogeneity in shopping behaviors. This study bridges this gap by analyzing e-commerce [...] Read more.
E-commerce growth has reshaped consumer behavior and retail services, driving parcel demand and challenging last-mile logistics. Existing research predominantly relies on survey data and global regression models that overlook intra-urban spatial heterogeneity in shopping behaviors. This study bridges this gap by analyzing e-commerce demand’s spatial distribution from a retail service perspective, identifying key drivers, and evaluating implications for omnichannel strategies and logistics. Utilizing waybill big data, spatial analysis, and multiscale geographically weighted regression, we reveal: (1) High-density e-commerce demand areas are predominantly located in central districts, whereas peripheral regions exhibit statistically lower volumes. The spatial distribution pattern of e-commerce demand aligns with the urban development spatial structure. (2) Factors such as population density and education levels significantly influence e-commerce demand. (3) Convenience stores play a dual role as retail service providers and parcel collection points, reinforcing their importance in shaping consumer accessibility and service efficiency, particularly in underserved urban areas. (4) Supermarkets exert a substitution effect on online shopping by offering immediate product availability, highlighting their role in shaping consumer purchasing preferences and retail service strategies. These findings contribute to retail and consumer services research by demonstrating how spatial e-commerce demand patterns reflect consumer shopping preferences, the role of omnichannel retail strategies, and the competitive dynamics between e-commerce and physical retail formats. Full article
(This article belongs to the Topic Data Science and Intelligent Management)
Show Figures

Figure 1

34 pages, 12304 KB  
Article
Updating of the Archival Large-Scale Soil Map Based on the Multitemporal Spectral Characteristics of the Bare Soil Surface Landsat Scenes
by Dmitry I. Rukhovich, Polina V. Koroleva, Alexey D. Rukhovich and Mikhail A. Komissarov
Remote Sens. 2023, 15(18), 4491; https://doi.org/10.3390/rs15184491 - 12 Sep 2023
Cited by 9 | Viewed by 1842
Abstract
For most of the arable land in Russia (132–137 million ha), the dominant and accurate soil information is stored in the form of map archives on paper without coordinate reference. The last traditional soil map(s) (TSM, TSMs) were created over 30 years ago. [...] Read more.
For most of the arable land in Russia (132–137 million ha), the dominant and accurate soil information is stored in the form of map archives on paper without coordinate reference. The last traditional soil map(s) (TSM, TSMs) were created over 30 years ago. Traditional and/or archival soil map(s) (ASM, ASMs) are outdated in terms of storage formats, dates, and methods of production. The technology of constructing a multitemporal soil line (MSL) makes it possible to update ASMs and TSMs based on the processing of big remote-sensing data (RSD). To construct an MSL, the spectral characteristics of the bare soil surface (BSS) are used. The BSS on RSD is distinguished within the framework of the conceptual apparatus of the spectral neighborhood of the soil line. The filtering of big RSD is based on deep machine learning. In the course of the work, a vector georeferenced version of the ASM and an updated soil map were created based on the coefficient “C” of the MSL. The maps were verified based on field surveys (76 soil pits). The updated map is called the map of soil interpretation of the coefficient “C” (SIC “C”). The SIC “C” map has a more detailed legend compared to the ASM (7 sections/chapters instead of 5), greater accuracy (smaller errors of the first and second kind), and potential suitability for calculating soil organic matter/carbon (SOM/SOC) reserves (soil types/areals in the SIC “C” map are statistically significant are divided according to the thickness of the organomineral horizon and the content of SOM in the plowed layer). When updating, a systematic underestimation of the numbers of contours and areas of soils with manifestations of negative/degradation soil processes (slitization and erosion) on the TSM was established. In the process of updating, all three shortcomings of the ASMs/TSMs (archaic storage, dates, and methods of creation) were eliminated. The SIC “C” map is digital (thematic raster), modern, and created based on big data processing methods. For the first time, the actualization of the soil map was carried out based on the MSL characteristics (coefficient “C”). Full article
(This article belongs to the Special Issue Remote Sensing for Soil Mapping and Monitoring)
Show Figures

Graphical abstract

18 pages, 4749 KB  
Article
Effective Ransomware Detection Using Entropy Estimation of Files for Cloud Services
by Kyungroul Lee, Jaehyuk Lee, Sun-Young Lee and Kangbin Yim
Sensors 2023, 23(6), 3023; https://doi.org/10.3390/s23063023 - 10 Mar 2023
Cited by 7 | Viewed by 3658
Abstract
A variety of data-based services such as cloud services and big data-based services have emerged in recent times. These services store data and derive the value of the data. The reliability and integrity of the data must be ensured. Unfortunately, attackers have taken [...] Read more.
A variety of data-based services such as cloud services and big data-based services have emerged in recent times. These services store data and derive the value of the data. The reliability and integrity of the data must be ensured. Unfortunately, attackers have taken valuable data as hostage for money in attacks called ransomware. It is difficult to recover original data from files in systems infected by ransomware because they are encrypted and cannot be accessed without keys. There are cloud services to backup data; however, encrypted files are synchronized with the cloud service. Therefore, the original file cannot be restored even from the cloud when the victim systems are infected. Therefore, in this paper, we propose a method to effectively detect ransomware for cloud services. The proposed method detects infected files by estimating the entropy to synchronize files based on uniformity, one of the characteristics of encrypted files. For the experiment, files containing sensitive user information and system files for system operation were selected. In this study, we detected 100% of the infected files in all file formats, with no false positives or false negatives. We demonstrate that our proposed ransomware detection method was very effective compared to other existing methods. Based on the results of this paper, we expect that this detection method will not synchronize with a cloud server by detecting infected files even if the victim systems are infected with ransomware. In addition, we expect to restore the original files by backing up the files stored on the cloud server. Full article
(This article belongs to the Special Issue Sensors Young Investigators’ Contributions Collection)
Show Figures

Figure 1

28 pages, 7707 KB  
Review
Blockchain Integration in the Era of Industrial Metaverse
by Dimitris Mourtzis, John Angelopoulos and Nikos Panopoulos
Appl. Sci. 2023, 13(3), 1353; https://doi.org/10.3390/app13031353 - 19 Jan 2023
Cited by 105 | Viewed by 11478
Abstract
Blockchain can be realized as a distributed and decentralized database, also known as a “distributed ledger,” that is shared among the nodes of a computer network. Blockchain is a form of democratized and distributed database for storing information electronically in a digital format. [...] Read more.
Blockchain can be realized as a distributed and decentralized database, also known as a “distributed ledger,” that is shared among the nodes of a computer network. Blockchain is a form of democratized and distributed database for storing information electronically in a digital format. Under the framework of Industry 4.0, the digitization and digitalization of manufacturing and production systems and networks have been focused, thus Big Data sets are a necessity for any manufacturing activity. Big Data sets are becoming a useful resource as well as a byproduct of the activities/processes taking place. However, there is an imminent risk of cyberattacks. The contribution of blockchain technology to intelligent manufacturing can be summarized as (i) data validity protection, (ii) inter- and intra-organizational communication organization, and (iii) efficiency improvement of manufacturing processes. Furthermore, the need for increased cybersecurity is magnified as the world is heading towards a super smart and intelligent societal model, also known as “Society 5.0,” and the industrial metaverse will become the new reality in manufacturing. Blockchain is a cutting-edge, secure information technology that promotes business and industrial innovation. However, blockchain technologies are bound by existing limitations regarding scalability, flexibility, and cybersecurity. Therefore, in this literature review, the implications of blockchain technology for addressing the emerging cybersecurity barriers toward safe and intelligent manufacturing in Industry 5.0 as a subset of Society 5.0 are presented. Full article
Show Figures

Figure 1

26 pages, 3176 KB  
Article
Forensic Analysis of TikTok Alternatives on Android and iOS Devices: Byte, Dubsmash, and Triller
by Yansi Keim, Shinelle Hutchinson, Apoorva Shrivastava and Umit Karabiyik
Electronics 2022, 11(18), 2972; https://doi.org/10.3390/electronics11182972 - 19 Sep 2022
Cited by 13 | Viewed by 7526
Abstract
TikTok has consistently been one of the most used mobile apps worldwide on any mobile operating system. However, despite people’s enjoyment of using the application, there have been growing concerns about the application’s origins and alleged privacy violations. These allegations have become such [...] Read more.
TikTok has consistently been one of the most used mobile apps worldwide on any mobile operating system. However, despite people’s enjoyment of using the application, there have been growing concerns about the application’s origins and alleged privacy violations. These allegations have become such a big problem that the former President of the United States, Donald Trump, expressed a desire to ban the TikTok application from being offered on US application stores like Google’s Play Store and Apple’s App Store. This remark sent TikTok users into a frenzy to find alternatives before the ban took effect. To this end, several alternative applications for TikTok have surfaced and are already garnering millions of users. In this paper, we identified three popular alternatives to the TikTok application (Byte, Dubmash, and Triller) and forensically analyzed each on smartphones of Android version 8 and iOS version 13. We focused on identifying forensically relevant artifacts that may be helpful to investigators in the event of a criminal investigation, should these or similar apps fall under scrutiny. We used Magnet AXIOM Process and Cellebrite UFED 4PC for acquisition, and Magnet AXIOM Examine and DB Browser for SQLite for analysis and reading. The investigation resulted in successful extraction of expected yet unique data points, plain text sensitive data, directories and format. These results lead to a discussion about identifying and comparing these app’s privacy concerns to that of TikTok, as formulated from the literature. Full article
(This article belongs to the Special Issue Digital Security and Privacy Protection: Trends and Applications)
Show Figures

Figure 1

22 pages, 12366 KB  
Article
Famous Chinese Traditional Dishes: Spatial Diffusion of Roast Duck in Mainland China and Spatial Association Characteristics of Chain Stores
by Ke Zhang, Yanjun Ye, Yingqiao Qiu and Xinfeng Li
Sustainability 2022, 14(14), 8554; https://doi.org/10.3390/su14148554 - 13 Jul 2022
Viewed by 2726
Abstract
The spatial pattern and geographical diffusion of Chinese traditional food culture are important manifestations of population migration and cultural chain remodeling. Taking the national roast duck stores and Beijing Quanjude and Bianyifang brand chain roast duck stores as the research objects, the spatial [...] Read more.
The spatial pattern and geographical diffusion of Chinese traditional food culture are important manifestations of population migration and cultural chain remodeling. Taking the national roast duck stores and Beijing Quanjude and Bianyifang brand chain roast duck stores as the research objects, the spatial distribution characteristics and geographic diffusion patterns of roast duck stores, and the spatial association characteristics of the chain stores are analyzed by using spatial analysis methods and mathematical statistics. The results of the study showed that: (1) The roast duck stores in the mainland show an overall northeast-southwest direction, and the spatial distribution is extremely uneven. The eastern coast of China shows a high-value continuous distribution, from the Bohai Bay Economic Circle and the Yangtze River Delta Economic Circle, gradually radiating westward to the middle and showing the clustering characteristics of “point + surface”. (2) Using the point cluster analysis method, the diffusion pattern of roast duck stores in the three major economic zones of China is explored, and roast duck stores in the western region show the characteristics of contact diffusion combined with hierarchical diffusion. Contact diffusion is the main diffusion mode of roast duck stores in the east. The central region shows the diffusion characteristics of contact diffusion combined with hierarchical diffusion. Overall, the roast duck stores in mainland China show a composite diffusion pattern. (3) Quanjude and Bianyifang stores have spatial agglomeration characteristics, Quanjude chain stores have a slightly stronger central pointing, while Bianyifang roast duck chain stores have slightly wider spatial diffusion. Both brands significantly show spatial orientation close to transportation facilities and high consumption markets. The street population has a slightly weaker influence on the spatial distribution of the two brands. (4) Through the multivariate spatial analysis method, it is found that the spatial correlation of mutual attraction between Quanjude and Bianyifang roast duck chain stores is presented, but there are differences in the formation mechanism and weak asymmetry in the attraction intensity, which is related to the consumer population and corporate positioning of Quanjude and Bianyifang. With the advent of the big data era, it is possible to obtain and use big data analysis methods to reshape the deep information under the surface logic. Attention should be paid to the location choice of traditional restaurant chains in the new era, to explore the possibilities of enterprise development, and to improve the efficiency of urban space. Full article
Show Figures

Figure 1

20 pages, 7099 KB  
Article
Quantification of Spatial Association between Commercial and Residential Spaces in Beijing Using Urban Big Data
by Lei Zhou, Ming Liu, Zhenlong Zheng and Wei Wang
ISPRS Int. J. Geo-Inf. 2022, 11(4), 249; https://doi.org/10.3390/ijgi11040249 - 11 Apr 2022
Cited by 12 | Viewed by 3467
Abstract
Commercial and residential spaces are two core types of geographical objects in urban areas. However, these two types of spaces are not independent of each other. Spatial associations exist between them, and a thorough understanding of this spatial association is of great significance [...] Read more.
Commercial and residential spaces are two core types of geographical objects in urban areas. However, these two types of spaces are not independent of each other. Spatial associations exist between them, and a thorough understanding of this spatial association is of great significance for improving the efficiency of urban spatial allocation and realizing scientific spatial planning and governance. Thus, in this paper, the spatial association between commercial and residential spaces in Beijing is quantified with GIS spatial analysis of the average nearest neighbor distance, kernel density, spatial correlation, and honeycomb grid analysis. Point-of-interest (POI) big data of the commercial and residential spaces is used in the quantification since this big data represents a comprehensive sampling of these two spaces. The results show that the spatial distributions of commercial and residential spaces are highly correlated, maintaining a relatively close consumption spatial association. However, the degrees of association between different commercial formats and residential spaces vary, presenting the spatial association characteristics of “integration of daily consumption and separation of nondaily consumption”. The commercial formats of catering services, recreation and leisure services, specialty stores, and agricultural markets are strongly associated with the residential spaces. However, the development of frequently used commercial formats of daily consumption such as living services, convenience stores, and supermarkets appears to lag behind the development of residential spaces. In addition, large-scale comprehensive and specialized commercial formats such as shopping malls, home appliances and electronics stores, and home building materials markets are lagging behind the residential spaces over a wide range. This paper is expected to provide development suggestions for the transformation of urban commercial and residential spaces and the construction of “people-oriented” smart cities. Full article
(This article belongs to the Special Issue Applications of GIScience for Land Administration)
Show Figures

Figure 1

11 pages, 992 KB  
Article
A Quality Control Methodology for Heterogeneous Vehicular Data Streams
by Konstantina Remoundou, Theodoros Alexakis, Nikolaos Peppes, Konstantinos Demestichas and Evgenia Adamopoulou
Sensors 2022, 22(4), 1550; https://doi.org/10.3390/s22041550 - 18 Feb 2022
Cited by 2 | Viewed by 2531
Abstract
The rapid evolution of sensors and communication technologies has led to the production and transfer of mass data streams from vehicles either inside their electronic units or to the outside world using the internet infrastructure. The “outside world”, in most cases, consists of [...] Read more.
The rapid evolution of sensors and communication technologies has led to the production and transfer of mass data streams from vehicles either inside their electronic units or to the outside world using the internet infrastructure. The “outside world”, in most cases, consists of third-party applications, such as fleet or traffic management control centers, which utilize vehicular data for reporting and monitoring functionalities. Such applications, in most cases, in order to facilitate their needs, require the exchange and processing of vast amounts of data which can be handled by the so-called Big Data technologies. The purpose of this study is to present a hybrid platform suitable for data collection, storing and analysis enhanced with quality control actions. In particular, the collected data contain various formats originating from different vehicle sensors and are stored in the aforementioned platform in a continuous way. The stored data in this platform must be checked in order to determine and validate them in terms of quality. To do so, certain actions, such as missing values checks, format checks, range checks, etc., must be carried out. The results of the quality control functions are presented herein, and useful conclusions are drawn in order to avoid possible data quality problems which may occur in further analysis and use of the data, e.g., for training of artificial intelligence models. Full article
(This article belongs to the Special Issue State-of-the-Art Sensors Technology in Greece)
Show Figures

Figure 1

9 pages, 1150 KB  
Article
Experimental Characteristics Study of Data Storage Formats for Data Marts Development within Data Lakes
by Vladimir Belov, Alexander N. Kosenkov and Evgeny Nikulchev
Appl. Sci. 2021, 11(18), 8651; https://doi.org/10.3390/app11188651 - 17 Sep 2021
Cited by 5 | Viewed by 3407
Abstract
One of the most popular methods for building analytical platforms involves the use of the concept of data lakes. A data lake is a storage system in which the data are presented in their original format, making it difficult to conduct analytics or [...] Read more.
One of the most popular methods for building analytical platforms involves the use of the concept of data lakes. A data lake is a storage system in which the data are presented in their original format, making it difficult to conduct analytics or present aggregated data. To solve this issue, data marts are used, representing environments of stored data of highly specialized information, focused on the requests of employees of a certain department, the vector of an organization’s work. This article presents a study of big data storage formats in the Apache Hadoop platform when used to build data marts. Full article
(This article belongs to the Special Issue Big Data: Advanced Methods, Interdisciplinary Study and Applications)
Show Figures

Figure 1

12 pages, 1703 KB  
Article
A Convenient and Low-Cost Model of Depression Screening and Early Warning Based on Voice Data Using for Public Mental Health
by Xin Chen and Zhigeng Pan
Int. J. Environ. Res. Public Health 2021, 18(12), 6441; https://doi.org/10.3390/ijerph18126441 - 14 Jun 2021
Cited by 27 | Viewed by 4381
Abstract
Depression is a common mental health disease, which has great harm to public health. At present, the diagnosis of depression mainly depends on the interviews between doctors and patients, which is subjective, slow and expensive. Voice data are a kind of data that [...] Read more.
Depression is a common mental health disease, which has great harm to public health. At present, the diagnosis of depression mainly depends on the interviews between doctors and patients, which is subjective, slow and expensive. Voice data are a kind of data that are easy to obtain and have the advantage of low cost. It has been proved that it can be used in the diagnosis of depression. The voice data used for modeling in this study adopted the authoritative public data set, which had passed the ethical review. The features of voice data were extracted by Python programming, and the voice features were stored in the format of CSV files. Through data processing, a big database, containing 1479 voice feature samples, was generated for modeling. Then, the decision tree screening model of depression was established by 10-fold cross validation and algorithm selection. The experiment achieved 83.4% prediction accuracy on voice data set. According to the prediction results of the model, the patients can be given early warning and intervention in time, so as to realize the health management of personal depression. Full article
(This article belongs to the Section Mental Health)
Show Figures

Figure 1

27 pages, 14001 KB  
Article
Big Data in Smart City: Management Challenges
by Mladen Amović, Miro Govedarica, Aleksandra Radulović and Ivana Janković
Appl. Sci. 2021, 11(10), 4557; https://doi.org/10.3390/app11104557 - 17 May 2021
Cited by 23 | Viewed by 9263
Abstract
Smart cities use digital technologies such as cloud computing, Internet of Things, or open data in order to overcome limitations of traditional representation and exchange of geospatial data. This concept ensures a significant increase in the use of data to establish new services [...] Read more.
Smart cities use digital technologies such as cloud computing, Internet of Things, or open data in order to overcome limitations of traditional representation and exchange of geospatial data. This concept ensures a significant increase in the use of data to establish new services that contribute to better sustainable development and monitoring of all phenomena that occur in urban areas. The use of the modern geoinformation technologies, such as sensors for collecting different geospatial and related data, requires adequate storage options for further data analysis. In this paper, we suggest the biG dAta sMart cIty maNagEment SyStem (GAMINESS) that is based on the Apache Spark big data framework. The model of the GAMINESS management system is based on the principles of the big data modeling, which differs greatly from standard databases. This approach provides the ability to store and manage huge amounts of structured, semi-structured, and unstructured data in real time. System performance is increasing to a higher level by using the process parallelization explained through the five V principles of the big data paradigm. The existing solutions based on the five V principles are focused only on the data visualization, not the data themselves. Such solutions are often limited by different storage mechanisms and by the ability to perform complex analyses on large amounts of data with expected performance. The GAMINESS management system overcomes these disadvantages by conversion of smart city data to a big data structure without limitations related to data formats or use standards. The suggested model contains two components: a geospatial component and a sensor component that are based on the CityGML and the SensorThings standards. The developed model has the ability to exchange data regardless of the used standard or the data format into proposed Apache Spark data framework schema. The verification of the proposed model is done within the case study for the part of the city of Novi Sad. Full article
(This article belongs to the Special Issue Application of Data Science in Smart Cities)
Show Figures

Figure 1

22 pages, 6451 KB  
Article
Choosing a Data Storage Format in the Apache Hadoop System Based on Experimental Evaluation Using Apache Spark
by Vladimir Belov, Andrey Tatarintsev and Evgeny Nikulchev
Symmetry 2021, 13(2), 195; https://doi.org/10.3390/sym13020195 - 26 Jan 2021
Cited by 17 | Viewed by 4535
Abstract
One of the most important tasks of any platform for big data processing is storing the data received. Different systems have different requirements for the storage formats of big data, which raises the problem of choosing the optimal data storage format to solve [...] Read more.
One of the most important tasks of any platform for big data processing is storing the data received. Different systems have different requirements for the storage formats of big data, which raises the problem of choosing the optimal data storage format to solve the current problem. This paper describes the five most popular formats for storing big data, presents an experimental evaluation of these formats and a methodology for choosing the format. The following data storage formats will be considered: avro, CSV, JSON, ORC, parquet. At the first stage, a comparative analysis of the main characteristics of the studied formats was carried out; at the second stage, an experimental evaluation of these formats was prepared and carried out. For the experiment, an experimental stand was deployed with tools for processing big data installed on it. The aim of the experiment was to find out characteristics of data storage formats, such as the volume and processing speed for different operations using the Apache Spark framework. In addition, within the study, an algorithm for choosing the optimal format from the presented alternatives was developed using tropical optimization methods. The result of the study is presented in the form of a technique for obtaining a vector of ratings of data storage formats for the Apache Hadoop system, based on an experimental assessment using Apache Spark. Full article
(This article belongs to the Special Issue 2020 Big Data and Artificial Intelligence Conference)
Show Figures

Figure 1

16 pages, 270 KB  
Article
Data Lake Governance: Towards a Systemic and Natural Ecosystem Analogy
by Marzieh Derakhshannia, Carmen Gervet, Hicham Hajj-Hassan, Anne Laurent and Arnaud Martin
Future Internet 2020, 12(8), 126; https://doi.org/10.3390/fi12080126 - 27 Jul 2020
Cited by 18 | Viewed by 5387
Abstract
The realm of big data has brought new venues for knowledge acquisition, but also major challenges including data interoperability and effective management. The great volume of miscellaneous data renders the generation of new knowledge a complex data analysis process. Presently, big data technologies [...] Read more.
The realm of big data has brought new venues for knowledge acquisition, but also major challenges including data interoperability and effective management. The great volume of miscellaneous data renders the generation of new knowledge a complex data analysis process. Presently, big data technologies provide multiple solutions and tools towards the semantic analysis of heterogeneous data, including their accessibility and reusability. However, in addition to learning from data, we are faced with the issue of data storage and management in a cost-effective and reliable manner. This is the core topic of this paper. A data lake, inspired by the natural lake, is a centralized data repository that stores all kinds of data in any format and structure. This allows any type of data to be ingested into the data lake without any restriction or normalization. This could lead to a critical problem known as data swamp, which can contain invalid or incoherent data that adds no values for further knowledge acquisition. To deal with the potential avalanche of data, some legislation is required to turn such heterogeneous datasets into manageable data. In this article, we address this problem and propose some solutions concerning innovative methods, derived from a multidisciplinary science perspective to manage data lake. The proposed methods imitate the supply chain management and natural lake principles with an emphasis on the importance of the data life cycle, to implement responsible data governance for the data lake. Full article
(This article belongs to the Special Issue Selected Papers from the INSCI2019: Internet Science 2019)
20 pages, 341 KB  
Review
State-of-the-Art Geospatial Information Processing in NoSQL Databases
by Dongming Guo and Erling Onstein
ISPRS Int. J. Geo-Inf. 2020, 9(5), 331; https://doi.org/10.3390/ijgi9050331 - 19 May 2020
Cited by 41 | Viewed by 9568
Abstract
Geospatial information has been indispensable for many application fields, including traffic planning, urban planning, and energy management. Geospatial data are mainly stored in relational databases that have been developed over several decades, and most geographic information applications are desktop applications. With the arrival [...] Read more.
Geospatial information has been indispensable for many application fields, including traffic planning, urban planning, and energy management. Geospatial data are mainly stored in relational databases that have been developed over several decades, and most geographic information applications are desktop applications. With the arrival of big data, geospatial information applications are also being modified into, e.g., mobile platforms and Geospatial Web Services, which require changeable data schemas, faster query response times, and more flexible scalability than traditional spatial relational databases currently have. To respond to these new requirements, NoSQL (Not only SQL) databases are now being adopted for geospatial data storage, management, and queries. This paper reviews state-of-the-art geospatial data processing in the 10 most popular NoSQL databases. We summarize the supported geometry objects, main geometry functions, spatial indexes, query languages, and data formats of these 10 NoSQL databases. Moreover, the pros and cons of these NoSQL databases are analyzed in terms of geospatial data processing. A literature review and analysis showed that current document databases may be more suitable for massive geospatial data processing than are other NoSQL databases due to their comprehensive support for geometry objects and data formats and their performance, geospatial functions, index methods, and academic development. However, depending on the application scenarios, graph databases, key-value, and wide column databases have their own advantages. Full article
(This article belongs to the Special Issue State-of-the-Art in Spatial Information Science)
12 pages, 606 KB  
Article
A New Way to Store Simple Text Files
by Marcin Lawnik, Artur Pełka and Adrian Kapczyński
Algorithms 2020, 13(4), 101; https://doi.org/10.3390/a13040101 - 22 Apr 2020
Cited by 6 | Viewed by 5291
Abstract
In the era of ubiquitous digitization, the Internet of Things (IoT), information plays a vital role. All types of data are collected, and some of this data are stored as text files. An important aspect—regardless of the type of data—is related to file [...] Read more.
In the era of ubiquitous digitization, the Internet of Things (IoT), information plays a vital role. All types of data are collected, and some of this data are stored as text files. An important aspect—regardless of the type of data—is related to file storage, especially the amount of disk space that is required. The less space is used on storing data sets, the lower is the cost of this service. Another important aspect of storing data warehouses in the form of files is the cost of data transmission needed for file transfer and its processing. Moreover, the data that are stored should be minimally protected against access and reading by other entities. The aspects mentioned above are particularly important for large data sets like Big Data. Considering the above criteria, i.e., minimizing storage space, data transfer, ensuring minimum security, the main goal of the article was to show the new way of storing text files. This article presents a method that converts data from text files like txt, json, html, py to images (image files) in png format. Taking into account such criteria as the output size of the file, the results obtained for the test files confirm that presented method enables to reduce the need for disk space, as well as to hide data in an image file. The described method can be used for texts saved in extended ASCII and UTF-8 coding. Full article
(This article belongs to the Special Issue Big Data Solutions)
Show Figures

Figure 1

Back to TopTop