Open AccessData Descriptor
RetroTransformDB: A Dataset of Generic Transforms for Retrosynthetic Analysis
Data 2018, 3(2), 14; doi:10.3390/data3020014 -
Abstract
Presently, software tools for retrosynthetic analysis are widely used by organic, medicinal, and computational chemists. Rule-based systems extensively use collections of retro-reactions (transforms). While there are many public datasets with reactions in synthetic direction (usually non-generic reactions), there are no publicly-available databases with
[...] Read more.
Presently, software tools for retrosynthetic analysis are widely used by organic, medicinal, and computational chemists. Rule-based systems extensively use collections of retro-reactions (transforms). While there are many public datasets with reactions in synthetic direction (usually non-generic reactions), there are no publicly-available databases with generic reactions in computer-readable format which can be used for the purposes of retrosynthetic analysis. Here we present RetroTransformDB—a dataset of transforms, compiled and coded in SMIRKS line notation by us. The collection is comprised of more than 100 records, with each one including the reaction name, SMIRKS linear notation, the functional group to be obtained, and the transform type classification. All SMIRKS transforms were tested syntactically, semantically, and from a chemical point of view in different software platforms. The overall dataset design and the retrosynthetic fitness were analyzed and curated by organic chemistry experts. The RetroTransformDB dataset may be used by open-source and commercial software packages, as well as chemoinformatics tools. Full article
Figures

Open AccessData Descriptor
Sigfox and LoRaWAN Datasets for Fingerprint Localization in Large Urban and Rural Areas
Data 2018, 3(2), 13; doi:10.3390/data3020013 -
Abstract
Because of the increasing relevance of the Internet of Things and location-based services, researchers are evaluating wireless positioning techniques, such as fingerprinting, on Low Power Wide Area Network (LPWAN) communication. In order to evaluate fingerprinting in large outdoor environments, extensive, time-consuming measurement campaigns
[...] Read more.
Because of the increasing relevance of the Internet of Things and location-based services, researchers are evaluating wireless positioning techniques, such as fingerprinting, on Low Power Wide Area Network (LPWAN) communication. In order to evaluate fingerprinting in large outdoor environments, extensive, time-consuming measurement campaigns need to be conducted to create useful datasets. This paper presents three LPWAN datasets which are collected in large-scale urban and rural areas. The goal is to provide the research community with a tool to evaluate fingerprinting algorithms in large outdoor environments. During a period of three months, numerous mobile devices periodically obtained location data via a GPS receiver which was transmitted via a Sigfox or LoRaWAN message. Together with network information, this location data is stored in the appropriate LPWAN dataset. The first results of our basic fingerprinting implementation, which is also clarified in this paper, indicate a mean location estimation error of 214.58 m for the rural Sigfox dataset, 688.97 m for the urban Sigfox dataset and 398.40 m for the urban LoRaWAN dataset. In the future, we will enlarge our current datasets and use them to evaluate and optimize our fingerprinting methods. Also, we intend to collect additional datasets for Sigfox, LoRaWAN and NB-IoT. Full article
Figures

Figure 1

Open AccessArticle
Comparison between Simulation and Analytical Methods in Reliability Data Analysis: A Case Study on Face Drilling Rigs
Data 2018, 3(2), 12; doi:10.3390/data3020012 -
Abstract
Collecting the failure data and reliability analysis in an underground mining operation is challenging due to the harsh environment and high level of production pressure. Therefore, achieving an accurate, fast, and applicable analysis in a fleet of underground equipment is usually difficult and
[...] Read more.
Collecting the failure data and reliability analysis in an underground mining operation is challenging due to the harsh environment and high level of production pressure. Therefore, achieving an accurate, fast, and applicable analysis in a fleet of underground equipment is usually difficult and time consuming. This paper aims to discuss the main reliability analysis challenges in mining machinery by comparing three main approaches: two analytical methods (white-box and black-box modeling), and a simulation approach. For this purpose, the maintenance data from a fleet of face drilling rigs in a Swedish underground metal mine were extracted by the MAXIMO system over a period of two years and were applied for analysis. The investigations reveal that the performance of these approaches in ranking and the reliability of the studies of the machines is different. However, all mentioned methods provide similar outputs but, in general, the simulation estimates the reliability of the studied machines at a higher level. The simulation and white-box method sometimes provide exactly the same results, which are caused by their similar structure of analysis. On average, 9% of the data are missed in the white-box analysis due to a lack of sufficient data in some of the subsystems of the studies’ rigs. Full article
Figures

Figure 1

Open AccessData Descriptor
SIMADL: Simulated Activities of Daily Living Dataset
Data 2018, 3(2), 11; doi:10.3390/data3020011 -
Abstract
With the realisation of the Internet of Things (IoT) paradigm, the analysis of the Activities of Daily Living (ADLs), in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research
[...] Read more.
With the realisation of the Internet of Things (IoT) paradigm, the analysis of the Activities of Daily Living (ADLs), in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research in smart home design. Such datasets are an integral part of the visualisation of new smart home concepts as well as the validation and evaluation of emerging machine learning models. Machine learning techniques that can learn ADLs from sensor readings are used to classify, predict and detect anomalous patterns. Such techniques require data that represent relevant smart home scenarios, for training, testing and validation. However, the development of such machine learning techniques is limited by the lack of real smart home datasets, due to the excessive cost of building real smart homes. This paper provides two datasets for classification and anomaly detection. The datasets are generated using OpenSHS, (Open Smart Home Simulator), which is a simulation software for dataset generation. OpenSHS records the daily activities of a participant within a virtual environment. Seven participants simulated their ADLs for different contexts, e.g., weekdays, weekends, mornings and evenings. Eighty-four files in total were generated, representing approximately 63 days worth of activities. Forty-two files of classification of ADLs were simulated in the classification dataset and the other forty-two files are for anomaly detection problems in which anomalous patterns were simulated and injected into the anomaly detection dataset. Full article
Figures

Figure 1

Open AccessArticle
Associative Root–Pattern Data and Distribution in Arabic Morphology
Data 2018, 3(2), 10; doi:10.3390/data3020010 -
Abstract
This paper intends to present a large-scale dataset for Arabic morphology from a cognitive point of view considering the uniqueness of the root–pattern phenomenon. The center of attention is focused on studying this singularity in terms of estimating associative relationships between roots as
[...] Read more.
This paper intends to present a large-scale dataset for Arabic morphology from a cognitive point of view considering the uniqueness of the root–pattern phenomenon. The center of attention is focused on studying this singularity in terms of estimating associative relationships between roots as a higher level of abstraction for words meaning, and all their potential occurrences with multiple morpho-phonetic patterns. A major advantage of this approach resides in providing a novel balanced large-scale language resource, which can be viewed as an instantiated global root–pattern network consisting of roots, patterns, stems, and particles, estimated statistically for studying the morpho-phonetic level of cognition of Arabic. In this context, this paper asserts that balanced root-distribution is an additional significant key criterion for evaluating topic coverage in an Arabic corpus. Furthermore, some additional novel probabilistic morpho-phonetic measures and their distribution have been estimated in the form of root and pattern entropies besides bi-directional conditional probabilities of bi-grams of stems, roots, and particles. Around 29.2 million webpages of ClueWeb were extracted, filtered from non-Arabic texts, and converted into a large textual dataset containing around 11.5 billion word forms and 9.3 million associative relationships. As this dataset is predominantly considering the root–pattern phenomenon in Semitic languages, the acquired data might be significant support for researchers interested in studying phenomena of Arabic such as visual word cognition, morpho-phonetic perception, morphological analysis, and cognitively motivated query expansion, spell-checking, and information retrieval. Furthermore, based on data distribution and frequencies, constructing balanced corpora will be easier. Full article
Figures

Figure 1

Open AccessData Descriptor
A Data Set of Portuguese Traditional Recipes Based on Published Cookery Books
Data 2018, 3(1), 9; doi:10.3390/data3010009 -
Abstract
This paper presents a data set resulting from the abstraction of books of traditional recipes for Portuguese cuisine. Only starters, main courses, side dishes, and soups were considered. Desserts, cakes, sweets, puddings, and pastries were not included. Recipes were characterized by the province
[...] Read more.
This paper presents a data set resulting from the abstraction of books of traditional recipes for Portuguese cuisine. Only starters, main courses, side dishes, and soups were considered. Desserts, cakes, sweets, puddings, and pastries were not included. Recipes were characterized by the province and ingredients regardless of quantities or preparation. An exploratory characterization of recipes and ingredients is presented. Results show that Portuguese traditional recipes organize differently among the eleven provinces considered, setting up the basis for more detailed analyses of the 1382 recipes and 421 ingredients inventoried. Full article
Figures

Open AccessData Descriptor
RAE: The Rainforest Automation Energy Dataset for Smart Grid Meter Data Analysis
Data 2018, 3(1), 8; doi:10.3390/data3010008 -
Abstract
Datasets are important for researchers to build models and test how well their machine learning algorithms perform. This paper presents the Rainforest Automation Energy (RAE) dataset to help smart grid researchers test their algorithms that make use of smart meter data. This initial
[...] Read more.
Datasets are important for researchers to build models and test how well their machine learning algorithms perform. This paper presents the Rainforest Automation Energy (RAE) dataset to help smart grid researchers test their algorithms that make use of smart meter data. This initial release of RAE contains 1 Hz data (mains and sub-meters) from two residential houses. In addition to power data, environmental and sensor data from the house’s thermostat is included. Sub-meter data from one of the houses includes heat pump and rental suite captures, which is of interest to power utilities. We also show an energy breakdown of each house and show (by example) how RAE can be used to test non-intrusive load monitoring (NILM) algorithms. Full article
Figures

Figure 1

Open AccessArticle
Uttarakhand Medicinal Plants Database (UMPDB): A Platform for Exploring Genomic, Chemical, and Traditional Knowledge
Data 2018, 3(1), 7; doi:10.3390/data3010007 -
Abstract
Medicinal plants are the main natural pools for the primary health care system, ethno-medicine, as well as traditional Indian system of several medicines. Uttarakhand also known as ‘Herbal State’, is a rich source of medicinal plants and traditional medicinal knowledge. A great deal
[...] Read more.
Medicinal plants are the main natural pools for the primary health care system, ethno-medicine, as well as traditional Indian system of several medicines. Uttarakhand also known as ‘Herbal State’, is a rich source of medicinal plants and traditional medicinal knowledge. A great deal of information about medicinal plants of Uttarakhand is scattered in different forms. Although many medicinal plant databases are available, currently there is no cohesive manually curated database of medicinal plants widely distributed in Uttarakhand state. A comprehensive database has been developed, known as the Uttarakhand Medicinal Plants Database (UMPDB). UMPDB provides extensive information on botanical name, common name, taxonomy, genomic taxonomy id, habit, habitat, location in Uttarakhand, part use, medicinal use, genomic information (including number of nucleotides, proteins, ESTs), chemical information, and scientific literature. Annotated medicinal plants integrated in the current version of the database were collected from the existing books, databases, and available literature. The current version of UMPDB contains the 1127 records of medicinal plants which belong to 153 plant families distributed across 13 districts of Uttarakhand. The primary goal of developing this database is to provide traditional, genomic, and chemical descriptions of the medicinal plants exclusively found in various regions of Uttarakhand. We anticipate that embedded information in the database would help users to readily obtain desired information. Full article
Figures

Open AccessEditorial
Acknowledgement to Reviewers of Data in 2017
Data 2018, 3(1), 6; doi:10.3390/data3010006 -
Abstract
Peer review is an essential part in the publication process, ensuring that Data maintains high quality standards for its published papers [...] Full article
Open AccessData Descriptor
Thirty Thousand 3D Models from Thingiverse
Data 2018, 3(1), 5; doi:10.3390/data3010005 -
Abstract
This dataset contains files and geometrical analysis of 3D model data, acquired from the thingiverse online repository. More than thirty thousand stereolithography files (STL) were retrieved and analysed. The geometrical analysis of the respective models is presented along with model renderings in both
[...] Read more.
This dataset contains files and geometrical analysis of 3D model data, acquired from the thingiverse online repository. More than thirty thousand stereolithography files (STL) were retrieved and analysed. The geometrical analysis of the respective models is presented along with model renderings in both GIF and PNG format, and pre-sliced machine instructions as GCode. This dataset is intended to be used as a basis for further research in Additive Manufacturing (AM), such as 3D printing time estimation, printability assessment or slicing algorithm development. All files retrieved are user-generated, with the respective user and associated licence presented in the overview. The dataset was acquired between 2016 and 2017. Full article
Figures

Figure 1

Open AccessData Descriptor
Long-Term WiFi Fingerprinting Dataset for Research on Robust Indoor Positioning
Data 2018, 3(1), 3; doi:10.3390/data3010003 -
Abstract
WiFi fingerprinting, one of the most popular methods employed in indoor positioning, currently faces two major problems: lack of robustness to short and long time signal changes and difficult reproducibility of new methods presented in the relevant literature. This paper presents a WiFi
[...] Read more.
WiFi fingerprinting, one of the most popular methods employed in indoor positioning, currently faces two major problems: lack of robustness to short and long time signal changes and difficult reproducibility of new methods presented in the relevant literature. This paper presents a WiFi RSS (Received Signal Strength) database created to foster and ease research works that address the above-mentioned two problems. A trained professional took several consecutive fingerprints while standing at specific positions and facing specific directions. The consecutive fingerprints may enable the study of short-term signals variations. The data collection spanned over 15 months, and, for each month, one type of training datasets and five types of test datasets were collected. The measurements of a dataset type (training or test) were taken at the same positions and directions every month, in order to enable the analysis of long-term signal variations. The database is provided with supporting materials and software, which give more information about the collection environment and eases the database utilization, respectively. The WiFi measurements and the supporting materials are available at the Zenodo repository under the open-source MIT license. Full article
Figures

Open AccessArticle
CoeViz: A Web-Based Integrative Platform for Interactive Visualization of Large Similarity and Distance Matrices
Data 2018, 3(1), 4; doi:10.3390/data3010004 -
Abstract
Similarity and distance matrices are general data structures that describe reciprocal relationships between the objects within a given dataset. Commonly used methods for representation of these matrices include heatmaps, hierarchical trees, dimensionality reduction, and various types of networks. However, despite a well-developed foundation
[...] Read more.
Similarity and distance matrices are general data structures that describe reciprocal relationships between the objects within a given dataset. Commonly used methods for representation of these matrices include heatmaps, hierarchical trees, dimensionality reduction, and various types of networks. However, despite a well-developed foundation for the visualization of such representations, the challenge of creating an interactive view that would allow for quick data navigation and interpretation remains largely unaddressed. This problem becomes especially evident for large matrices with hundreds or thousands objects. In this work, we present a web-based platform for the interactive analysis of large (dis-)similarity matrices. It consists of four major interconnected and synchronized components: a zoomable heatmap, interactive hierarchical tree, scalable circular relationship diagram, and 3D multi-dimensional scaling (MDS) scatterplot. We demonstrate the use of the platform for the analysis of amino acid covariance data in proteins as part of our previously developed CoeViz tool. The web-platform enables quick and focused analysis of protein features, such as structural domains and functional sites. Full article
Figures

Figure 1

Open AccessData Descriptor
A Data Set of Human Body Movements for Physical Rehabilitation Exercises
Data 2018, 3(1), 2; doi:10.3390/data3010002 -
Abstract
The article presents University of Idaho-Physical Rehabilitation Movement Data (UI-PRMD), a publically available data set of movements related to common exercises performed by patients in physical rehabilitation programs. For the data collection, 10 healthy subjects performed 10 repetitions of different physical therapy movements
[...] Read more.
The article presents University of Idaho-Physical Rehabilitation Movement Data (UI-PRMD), a publically available data set of movements related to common exercises performed by patients in physical rehabilitation programs. For the data collection, 10 healthy subjects performed 10 repetitions of different physical therapy movements with a Vicon optical tracker and a Microsoft Kinect sensor used for the motion capturing. The data are in a format that includes positions and angles of full-body joints. The objective of the data set is to provide a basis for mathematical modeling of therapy movements, as well as for establishing performance metrics for evaluation of patient consistency in executing the prescribed rehabilitation exercises. Full article
Figures

Figure 1a

Open AccessData Descriptor
World Ocean Isopycnal Level Absolute Geostrophic Velocity (WOIL-V) Inverted from GDEM with the P-Vector Method
Data 2018, 3(1), 1; doi:10.3390/data3010001 -
Abstract
Three-dimensional dataset of world ocean climatological annual and monthly mean absolute geostrophic velocity in isopycnal level (called WOIL-V) has been produced from the United States (U.S.) Navy’s Generalized Digital Environmental Model (GDEM) temperature and salinity fields (open access from the website http://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.nodc:9600094)
[...] Read more.
Three-dimensional dataset of world ocean climatological annual and monthly mean absolute geostrophic velocity in isopycnal level (called WOIL-V) has been produced from the United States (U.S.) Navy’s Generalized Digital Environmental Model (GDEM) temperature and salinity fields (open access from the website http://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.nodc:9600094) using the P-vector method. The data have horizontal resolution of 0.5° × 0.5°, and 222 isopycnal-levels. The total 13 data files include annual and monthly mean values. The WOIL-V is the only dataset of absolute geostrophic velocity in isopycnal level compatible to the GDEM (T, S) fields, and provides background ocean currents for oceanographic and climatic studies, especially in ocean modeling with the isopycnal coordinate system. Full article
Figures

Figure 1

Open AccessArticle
Investigating the Evolution of Linkage Dynamics among Equity Markets Using Network Models and Measures: The Case of Asian Equity Market Integration
Data 2017, 2(4), 41; doi:10.3390/data2040041 -
Abstract
The state of cross-market linkage structures and its stability over varying time-periods play a key role in the performance of international diversified portfolios. There has been an increasing interest of global investors in emerging capital markets in the Asian region. In this setting,
[...] Read more.
The state of cross-market linkage structures and its stability over varying time-periods play a key role in the performance of international diversified portfolios. There has been an increasing interest of global investors in emerging capital markets in the Asian region. In this setting, an investigation into the temporal dynamics of cross-market linkage structures becomes significant for the selection and optimal allocation of securities in an internationally-diversified portfolio. In the quest for this, in the current study, weighted network models along with network metrics are employed to decipher the underlying cross-market linkage structures among Asian markets. The study analyses the daily return data of fourteen major Asian indices for a period of 14 years (2002–2016). The topological properties of the network are computed using centrality measures and measures of influence strength and are investigated over temporal scales. In particular, the overall influence strengths and India-specific influence strengths are computed and examined over a temporal scale. Threshold filtering is also performed to characterize the dynamics related to the linkage structure of these networks. The impacts of the 2008 financial crisis on the linkage structural patterns of these equity networks are also investigated. The key findings of this study include: a set of central and peripheral indices, the evolution of the linkage structures over the 2002–2016 period and the linkage dynamics during times of market stress. Mainly, the set of indices possessing influence over the Asian region in general and the Indian market in particular is also identified. The findings of this study can be utilized in effective systemic risk management and for the selection of an optimally-diversified portfolio, resilient to system-level shocks. Full article
Figures

Figure 1

Open AccessData Descriptor
GasLib—A Library of Gas Network Instances
Data 2017, 2(4), 40; doi:10.3390/data2040040 -
Abstract
The development of mathematical simulation and optimization models and algorithms for solving gas transport problems is an active field of research. In order to test and compare these models and algorithms, gas network instances together with demand data are needed. The goal of
[...] Read more.
The development of mathematical simulation and optimization models and algorithms for solving gas transport problems is an active field of research. In order to test and compare these models and algorithms, gas network instances together with demand data are needed. The goal of GasLib is to provide a set of publicly available gas network instances that can be used by researchers in the field of gas transport. The advantages are that researchers save time by using these instances and that different models and algorithms can be compared on the same specified test sets. The library instances are encoded in an XML (extensible markup language) format. In this paper, we explain this format and present the instances that are available in the library. Full article
Figures

Figure 1

Open AccessArticle
Congestion Quantification Using the National Performance Management Research Data Set
Data 2017, 2(4), 39; doi:10.3390/data2040039 -
Abstract
Monitoring of transportation system performance is a key element of any transportation operation and planning strategy. Estimation of dependable performance measures relies on analysis of large amounts of traffic data, which are often expensive and difficult to gather. National databases can assist in
[...] Read more.
Monitoring of transportation system performance is a key element of any transportation operation and planning strategy. Estimation of dependable performance measures relies on analysis of large amounts of traffic data, which are often expensive and difficult to gather. National databases can assist in this regard, but challenges still remain with respect to data management, accuracy, storage, and use for performance monitoring. In an effort to address such challenges, this paper showcases a process that utilizes the National Performance Management Research Data Set (NPMRDS) for generating performance measures for congestion monitoring applications in the Birmingham region. The capabilities of the relational database management system (RDBMS) are employed to manage the large amounts of NPMRDS data. Powerful visual maps are developed using GIS software and used to illustrate congestion location, extent and severity. Travel time reliability indices are calculated and utilized to quantify congestion, and congestion intensity measures are developed and employed to rank and prioritize congested segments in the study area. The process for managing and using big traffic data described in the Birmingham case study is a great example that can be replicated by small and mid-size Metropolitan Planning Organizations to generate performance-based measures and monitor congestion in their jurisdictions. Full article
Figures

Figure 1

Open AccessData Descriptor
Antibody Exchange: Information Extraction of Biological Antibody Donation and a Web-Portal to Find Donors and Seekers
Data 2017, 2(4), 38; doi:10.3390/data2040038 -
Abstract
Bio-molecular reagents, like antibodies that are required in experimental biology are expensive and their effectiveness, among other things, is critical to the success of the experiment. Although such resources are sometimes donated by one investigator to another through personal communication between the two,
[...] Read more.
Bio-molecular reagents, like antibodies that are required in experimental biology are expensive and their effectiveness, among other things, is critical to the success of the experiment. Although such resources are sometimes donated by one investigator to another through personal communication between the two, there is no previous study to our knowledge on the extent of such donations, nor a central platform that directs resource seekers to donors. In this paper, we describe, to our knowledge, a first attempt at building a web-portal titled Antibody Exchange (or more general ‘Bio-Resource Exchange’) that attempts to bridge this gap between resource seekers and donors in the domain of experimental biology. Users on this portal can request for or donate antibodies, cell-lines, and DNA Constructs. This resource could also serve as a crowd-sourced database of resources for experimental biology. Further, we also studied the extent of antibody donations by mining the acknowledgement sections of scientific articles. Specifically, we extracted the name of the donor, his/her affiliation, and the name of the antibody for every donation by parsing the acknowledgements sections of articles. To extract annotations at this level, we adopted two approaches—a rule based algorithm and a bootstrapped pattern learning algorithm. The algorithms extracted donor names, affiliations, and antibody names with average accuracies of 57% and 62%, respectively. We also created a dataset of 50 expert-annotated acknowledgements sections that will serve as a gold standard dataset to evaluate extraction algorithms in the future. Full article
Figures

Figure 1

Open AccessArticle
Regionalization of a Landscape-Based Hazard Index of Malaria Transmission: An Example of the State of Amapá, Brazil
Data 2017, 2(4), 37; doi:10.3390/data2040037 -
Abstract
Identifying and assessing the relative effects of the numerous determinants of malaria transmission, at different spatial scales and resolutions, is of primary importance in defining control strategies and reaching the goal of the elimination of malaria. In this context, based on a knowledge-based
[...] Read more.
Identifying and assessing the relative effects of the numerous determinants of malaria transmission, at different spatial scales and resolutions, is of primary importance in defining control strategies and reaching the goal of the elimination of malaria. In this context, based on a knowledge-based model, a normalized landscape-based hazard index (NLHI) was established at a local scale, using a 10 mspatial resolution forest vs. non-forest map, landscape metrics and a spatial moving window. Such an index evaluates the contribution of landscape to the probability of human-malaria vector encounters, and thus to malaria transmission risk. Since the knowledge-based model is tailored to the entire Amazon region, such an index might be generalized at large scales for establishing a regional view of the landscape contribution to malaria transmission. Thus, this study uses an open large-scale land use and land cover dataset (i.e., the 30 m TerraClass maps) and proposes an automatic data-processing chain for implementing NLHI at large-scale. First, the impact of coarser spatial resolution (i.e., 30 m) on NLHI values was studied. Second, the data-processing chain was established using R language for customizing the spatial moving window and computing the landscape metrics and NLHI at large scale. This paper presents the results in the State of Amapá, Brazil. It offers the possibility of monitoring a significant determinant of malaria transmission at regional scale. Full article
Figures

Figure 1

Open AccessData Descriptor
Database of Himalayan Plants Based on Published Floras during a Century
Data 2017, 2(4), 36; doi:10.3390/data2040036 -
Abstract
The Himalaya is the largest mountain range in the world, spanning approximately ten degrees of latitude and elevation between 100 m asl to the highest mountain peak on earth. The region varies in plant species richness, being highest in the biodiversity hotspot of
[...] Read more.
The Himalaya is the largest mountain range in the world, spanning approximately ten degrees of latitude and elevation between 100 m asl to the highest mountain peak on earth. The region varies in plant species richness, being highest in the biodiversity hotspot of Eastern Himalaya and declining to the North-Western parts of the Himalaya. We examined all published floras (31 floras in 42 volumes spanning the years 1903–2014) from the Indian Himalayan region, Nepal, and Bhutan to compile a comprehensive checklist of all gymnosperms and angiosperms. A total of 10,503 species representing 240 families and 2322 genera are reported. We evaluated all the botanical names reported in the floras for their updated taxonomy and excluded >3000 synonyms. Additionally, we identified 1134 species reported in these floras that presently remain taxonomically unresolved and 160 species with missing information in the global plant database (The Plant List, 2013). This is the most comprehensive estimate of plant species diversity in the Himalaya. Full article
Figures

Figure 1