KVMod—A Novel Approach to Design Key-Value NoSQL Databases
Abstract
:1. Introduction
- They adopt flexible models of data, mostly schema-less;
- Eventual consistency transactions are reached by relaxing the ACID (Atomicity, Consistency, Isolation, and Durability) properties to scale out while achieving high availability and low latency;
- Query performance is achieved not only through data co-location but also through horizontal and elastic scalability;
- Data can be easily replicated and horizontally partitioned over remote and local servers.
- Key-value databases store data as a dictionary. Every item in the database is stored as a pair <k,v>, where k stands for key and represents an attribute name and v its value. Key-value databases ensure high performance in reading and writing operations. Redis and Riak KV are popular systems in this category;
- Document-oriented databases extend the key-value concepts by representing the value as a document encoded in conventional semi structured formats (like JSON). The advantage of this model is that it can retrieve a set of hierarchically structured information from a single key. MongoDB and CouchDB are document-oriented systems;
- Column-oriented databases: in a database of this family, data are grouped into column families whose schemas are flexible. A column family contains a set of columns. A column has a name, a timestamp, and a value with a complex or simple structure. Each column is stored in a separate location. Cassandra and HBase are examples of column-oriented systems;
- Graph databases represent a database as a graph structure. A graph is composed of a set of nodes (i.e., objects) and a set of edges describing the relationships between the nodes. These databases are efficient when data are strongly connected in which each of its nodes is reachable from the others. Neo4j is one of these systems.
2. Background
2.1. Running Example
- Q1 Registration number and capacity of aircraft whose capacity is within a given interval.
- Q2 Airports of a country (name, I.C.A.O code, and city) sorted by ascending order of cities.
- Q3 Full name and passport number of passengers on a specific flight.
- Q4 List of passengers departing from a country on a specific date, sorted by ascending order of departure cities, then by ascending order of departure time, the destination city must be also displayed.
- Q5 Passengers departing on a specific date, sorted by ascending order of the departed country and then ascending order of departed cities. The flight code and the departure time must be displayed.
- Q6 List of websites accessed by passengers on a given flight. The list is sorted in ascending order of passenger ID, then in descending chronological order of the access time.
- Q7 Aircraft departed from an airport during a given period. The departure time and date must be displayed.
- Q8 Localization data (geographical coordinates, crossing date, and time) of an aircraft used in a flight in descending chronological order.
- Q9 List of the Flights on a specific date, sorted in ascending order of flight level. Departure and arrival cities must be displayed.
2.2. Key-Value Family Stores
- In-memory key-value systems, allowing particularly fast access to data using memory to store it, such as Memcached;
- Persistence key-value systems, using disk to store data, such as Riak KV system;
- Hybrid key-value systems, that put data in memory and save them if necessary, such as Redis.
3. Related Works
4. Proposed Methodology
4.1. General Overview
- We proposed that the two metamodels be placed together in the Platform-Independent Data Metamodel (PIDM), a conceptual-level metamodel in which both data and access queries are incorporated;
- The advantage is to avoid the complexity that can result from working on one model separately from the other. which can produce a work on a model that is far from the other;
- Due to its platform independence, many NoSQL paradigms (including the key-value one) can use this metamodel as input.
- An instance of the PIDM metamodel is provided as input to the transformation process.
- The instance will be checked to see if it is error-free using the metamodel specifications.
- Then, a M2M (i.e., model-to-model) transformation translates this instance into another one of KVLM by applying a set of transformation rules.
- Finally, the third step of the process is a M2T (model-to-text) transformation that is performed to generate a physical implementation under the targeted technology (i.e., how data are structured on the machine key-value DBMS).
4.2. Platform-Independent Data Metamodel (PIDM)
- The structural model (Figure 5, left). It is defined in a UML-like syntax that is a widely used notation by developers and researchers in data-modeling. This notation is adequate to capture and structure the date requirements of domain data using the following rules:
- -
- The data are grouped into entities;
- -
- An entity contains attributes, which represent the data of its occurrences;
- -
- An entity may have references to other entities;
- -
- A reference can have constant cardinality, e.g., 0, 1, 2, 4, or unlimited cardinality, denoted as *.
- The query model (Figure 5, right) that represents the queries that will be sent to the database. Using an SQL-like syntax, these queries are defined in the PIDM metamodel over entities of the structural model. Navigation through these entities is performed by traversing their references. A query consists of the following clauses:
- -
- FROM. This clause specifies the main entity in which the query is executed;
- -
- INCLUDE. If the query needs other entities, references to the main entity can be added as inclusion elements. Inclusions can be recursively added while there are available references. This means that entities referenced in the inclusion clause can also be incorporated;
- -
- SELECT. It is used for the projection operation (i.e., the set of attributes to be retrieved by the query). The attributes to retrieve can come from the main entity or the inclusion entities;
- -
- WHERE. Using this key-word, a query can contain a boolean expression to filter occurrences that satisfy a given condition;
- -
- ORDERBY. As in SQL, it specifies the sorting attributes of a query result. The ordering can be in ascending or in descending direction. On another side, in multi-criteria ordering, the priority degrees are expressed using weights that are assigned to the sorting attributes. The weight is an integer number that provides information on the priority degree of the attribute. As an example, 1 for the most important sorting attribute, 2 for the second-most important attribute, etc.;
- -
- AS. It is used to give an easily identifiable name for an attribute in a query. It is useful to rename the references as well. An alias of an attribute can change from one query to another. Thus, a class association entitled alias is placed between the query and the attribute classes in the metamodel.
4.3. Running Example in the PIDM
- The informations to display are: origin and destination city, departure time of flights, the data about their passengers (passport ID, first and last names, birthdate, sex, and nationality);
- The main entity is Passenger;
- The Flight and Airport entities are included;
- The query filters the results via a boolean expression based on flight departure date and country attributes. The expression contains two equality conditions combined by the and operator;
- Finally, using the ORDERBY clause, the query result will be sorted in ascending order of the attribute departure city, then in ascending order of the attribute departure time flight.
4.4. Key-Value Stores Metamodel
- The key-value data model metaclass is the entry point to this metamodel;
- A key-value data model is considered a database schema that contains collection specifications;
- Each collection is a set of associative arrays;
- Identified by a unique key, an associative array is used to store a two-column matrix.
- -
- This matrix can be seen as a set of item pairs;
- -
- Each pair contains a field and a value.
4.5. Transformations of a PIDM Instance to a Key-Value Logical Data Model
4.5.1. Query to Collection Transformation
- At the beginning, this collection would contain nine fields: the projection attributes Origin.city, Destination.city, departureTime, idPassport, firstName, lastName, birthdate, sex, and nationality;
- The selection attributes FL.departureDate and Origin.country must be added to the collection;
- The sorting criteria of the query (i.e., Origin.city and FL.departureTime) are also included at the beginning, and they must not appear twice (it is useless to put the same attribute more than once in a collection).
Algorithm 1: Query to collection transformation rule |
4.5.2. Query Merging
Algorithm 2: Collection merging technique |
Algorithm 3: Schema optimization method |
4.6. Logical Model to Text Model Transformation
- The key space of a key-value system can contain several databases;
- A database is composed of collections;
- A collection contains HSets representing the physical implementation of associative arrays;
- An HSet is a set of pairs <field,value> including a special field called named the HSet key identifying HSets within the system.
- Obtain the value of the variable ;
- Create a new HSet with the identifier key ;
- Increment the variable .
5. KVMod’s Implementation and Assessment
5.1. Implementation
5.2. Assessment
5.2.1. Computational Resources
Query to Collection Transformation Algorithm
Collections Merging Algorithm
Schema Optimization Algorithm
Conceptual Model to Key-Value Logical Model Transforming Process
5.2.2. Applicability to Other Systems
6. Conclusions
- The literature review shows that the design of the NoSQL databases can be useful to standardize access and understand its data storage;
- The combined MDA-based and query-driven methodology used in the current study holds several advantages for both researchers and practitioners. The use of MDA aids in automating the modeling process. The support of the access queries is in line with the best practices in database design in the NoSQL world;
- The proposal introduces a series of models at different levels in order to make the process enrichable, especially at the logical and physical levels;
- We described a robust data-modeling tool, named KVDesign, that automates some of the most time-consuming data-modeling tasks, including conceptual-to-logical transformations and code generation.
- Firstly, the study supports only read queries, which are very important in a NoSQL context, but other operations like updates and insertions were not treated. For future research, we plan to extend our work to support all CRUD queries, including the aggregation operations.
- Secondly, key-value DBMS offers several data structures, including hashes, which are the only ones used in this work. We intend to study how to support other types, like sorted sets, to cover the maximum number of useful elements in the data design.
- On the other hand, a software KVMod-based tool was developed to design key-value databases. In the future, we plan to allow practitioners and designers to test it in different use cases and then collect user reviews in order to improve the design of key-value databases.
- Finally, due to the similarity of the DBMS of the same family, we plan to study database modeling in other key-value stores while benefiting especially from the conceptual and logical metamodels also introduced in our proposal.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ACID | Atomicity, Consistency, Isolation, and Durability |
ATL | Atlas Transformation Language |
CRUD | Create, Read, Update, and Delete |
DBMS | Database Management System |
DDL | Data Definition Language |
DSL | Domain-Specific Language |
EGL | Epsilon Generation Language |
ERD | Entity Relationship Diagram |
ETL | Epsilon Transformation Language |
GPS | Global Positioning System |
HSet | Hash Set |
I.C.A.O | International Civil Aviation Organization |
KVDesign | Design Tool for Key-Value Design |
KVLM | Key-Value Logical Metamodel |
KVMod | Key-Value Modeling |
M2M | Model-to-model |
M2T | Model-to-text |
MDA | Model-Driven Architecture |
NoSQL | Not Only SQL |
OCL | Object Constraint Language |
PIDM | Platform-Independent Data Metamodel |
QVT | Query View Transform |
RDBMS | Relational DBMS |
SQL | Structured Query Language |
UML | Unified Modeling Language |
XML | Extensible Markup Language |
References
- Dourhri, A.; Hanine, M.; Ouahmane, H. A New Algorithm for Data Migration from a Relational to a NoSQL Oriented Column Database. In Proceedings of the International Conference on Smart City Applications (SCA21), Safranbolu, Turkey, 28 October 2021. [Google Scholar]
- Akoka, J.; Comyn-Wattiau, I.; Laoufi, N. Research on Big Data—A systematic mapping study. Comput. Stand. Interfaces 2017, 54, 105–115. [Google Scholar] [CrossRef]
- Corbellini, A.; Mateos, C.; Zunino, A.; Godoy, D.; Schiaffino, S. Persisting bigdata: The NoSQL landscape. Inf. Syst. 2017, 63, 1–23. [Google Scholar] [CrossRef]
- Davoudian, A.; Chen, L.; Liu, L. A survey on NoSQL stores. ACM Comput. Surv. 2018, 51, 1–46. [Google Scholar] [CrossRef]
- Diogo, M.; Cabral, B.; Bernardino, J. Consistency Models of NoSQL Databases. Future Internet 2019, 11, 43. [Google Scholar] [CrossRef]
- Asadi, M.; Ramsin, R. MDA-Based Methodologies: An Analytical Survey. In Proceedings of the European Conference on Model Driven Architecture—Foundations and Applications, Berlin, Germany, 9–13 June 2008. [Google Scholar]
- Atzeni, P. Data Modelling in the NoSQL world: A contradiction? In Proceedings of the 17th International Conference on Computer Systems and Technologies, Palermo, Italy, 23–24 June 2016.
- Kaur, K.; Rani, R. Modeling and querying data in NoSQL databases. In Proceedings of the International Conference on Big Data (IEEE), Silicon Valley, CA, USA, 6–9 October 2013. [Google Scholar]
- Chebotko, A.; Kashlev, A.; Lu, S. A Big Data Modeling Methodology for Apache Cassandra. In Proceedings of the IEEE International Congress on Big Data, Silicon Valley, CA, USA, 27 June–2 July 2015. [Google Scholar]
- Roy-Hubara, N.; Rokach, L.; Shapira, B.; Shoval, P. Modeling Graph Database Schema. IT Prof. 2017, 19, 34–43. [Google Scholar] [CrossRef]
- Hanine, M.; Lachgar, M.; Lachgar, S.; Elmahfoudi, O.; Boutkhoum, O. MDA Approach for Designing and Developing Data Warehouses: A Systematic Review & Proposal. Int. J. Online Biomed. Eng. 2021, 17, 99–110. [Google Scholar]
- De la Vega, A.; García-Saiz, D.; Blanco, C.; Marta, Z.; Pablo, S. Mortadelo: Automatic generation of NoSQL stores from platform-independent data models. Future Gener. Comput. Syst. 2020, 105, 455–474. [Google Scholar] [CrossRef]
- Abdelhedi, F.; Ait Brahim, A.; Atigui, F.; Zurfluh, G. MDA-Based Approach for NoSQL Databases Modelling. In Proceedings of the 19th International Conference on Big Data Analytics and Knowledge Discovery, Lyon, France, 28–31 August 2017. [Google Scholar]
- Abdelhedi, F.; Ait Brahim, A.; Zurfluh, G. Applying a Model-Driven Approach for UML/OCL Constraints: Application to NoSQL Databases. In Proceedings of the Confederated International Conferences “On the Move to Meaningful Internet Systems”, Rhodes, Greece, 21–25 October 2019. [Google Scholar]
- Liu, S.; Rahman, M.R.; Skeirik, S.; Gupta, I.; Meseguer, J. Formal Modeling and Analysis of Cassandra in Maude. In Formal Methods and Software Engineering. ICFEM 2014. Lecture Notes in Computer Science; Merz, S., Pang, J., Eds.; Springer: Cham, Switzerland, 2014; Volume 8829. [Google Scholar] [CrossRef]
- Neeru; Kaur, B. Cassandra vs. MySQL: Modelling and querying format. IJCTA J. 2016, 9, 5199–5206. [Google Scholar]
- Shashank, T. Professional NoSQL, 1st ed.; Wrox: Birmingham, UK, 2011. [Google Scholar]
- Carlson, J.L. Redis in Action, 1st ed.; Manning Publications: Greenwich, CT, USA, 2013. [Google Scholar]
- Redis Official Documentation. Available online: https://redis.io/docs/manual (accessed on 28 April 2023).
- Das, V. Learning Redis, 1st ed.; Packt Publishing: Birmingham, UK, 2015. [Google Scholar]
- Gwendal, D.; Gerson, S.; Jordi, C.; Skeirik, S. UMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases. In Proceedings of the 35th International Conference on Conceptual Modeling, Gifu, Japan, 14–17 November 2016. [Google Scholar]
- Rossel, G.; Manna, A. A Modeling methodology for NoSQL Key-Value databases. Database Syst. J. 2017, 8, 12–18. [Google Scholar]
- Li, C. Transforming relational database into HBase: A case study. In Proceedings of the IEEE International Conference on Software Engineering and Service Sciences, Beijing, China, 16–18 July 2010. [Google Scholar]
- Imam, A.A.; Basri, S.; Ahmad, R.; Watada, J.; Gonzlez-Aparicio, M.T.; Almomani, M.A. Data Modeling Guidelines for NoSQL Document-Store Databases. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 544–555. [Google Scholar] [CrossRef]
- de Lima, C.; dos Santos Mello, R. A workload-driven logical design approach for NoSQL document databases. In Proceedings of the 17th International Conference on Information Integration and Web-Based Applications & Services, New York, NY, USA, 11–13 December 2015. [Google Scholar]
- Chen, P.P. The entity-relationship model—Toward a unified view of data. ACM Trans. Database Syst. 1976, 9, 9–36. [Google Scholar] [CrossRef]
- Schroeder, R.; Duarte, D.; Mello, R.S. A workload-aware approach for optimizing the xml schema design trade-off. In Proceedings of the 13th International Conference on Information Integration and Web-Based Applications and Services, Ho Chi Minh City, Vietnam, 5–7 December 2011. [Google Scholar]
- Fernández Candel, C.; Sevilla, D.; García-Molina, J.; Chen, P.P. A unified metamodel for NoSQL and relational databases. Inf. Syst. 2021, 104, 101898. [Google Scholar] [CrossRef]
- Martinez-Mosquer, D.; Lujan-Mora, S.; Navarrete, R.; Mayorga, T.C.; Herrera, H.; Rodrigo, V. An approach to Big Data Modeling for Key-Value NoSQL Databases. RISTI—Rev. IbéRica Sist. E Tecnol. Informação 2019, 19, 519–530. [Google Scholar]
- Mior, M.; Salem, K.; Aboulnaga, A.; Liu, R. NoSE: Schema Design for NoSQL Applications. IEEE Trans. Knowl. Data Eng. 2017, 29, 2275–2289. [Google Scholar] [CrossRef]
- Definition of the Running Example Queries and Entities. Available online: https://github.com/dourhriahmed/kvmod/blob/main/ma.kvmod.pidm.examples/airFlight.pidm (accessed on 14 July 2023).
- Redis Search Module (Official Documentation). Available online: https://redis.io/docs/stack/search (accessed on 24 May 2022).
- KVDesign Project. Available online: https://github.com/dourhriahmed/kvmod (accessed on 14 July 2023).
- Steinberg, D.; Budinsky, F.; Paternostro, M.; Merks, E. EMF: Eclipse Modeling Framework, 2nd ed.; Addison-Wesley Professional: Boston, MA, USA, 2009. [Google Scholar]
- Rose, L.M.; Paige, R.F.; Kolovos, D.S.; Polack, F.A. The Epsilon Generation Language. In Proceedings of the 4th European Conference on Model Driven Architecture: Foundations and Applications, Berlin, Germany, 9–13 June 2008. [Google Scholar]
- Kleppe, A. Software Language Engineering: Creating Domain-Specific Languages Using Metamodels, 1st ed.; Addison-Wesley Professional: Boston, MA, USA, 2008. [Google Scholar]
- Eysholdt, M.; Behrens, H. Xtext: Implement Your Language Faster than the Quick and DirtyWay. In Proceedings of the 25th Annual Conference on Object-Oriented Programming, Systems, Languages, and Applications, Reno/Tahoe, NV, USA, 17–21 October 2010; pp. 307–309. [Google Scholar]
- Riak Key-Value System Official Documentation. Available online: https://riak.com/products/riak-kv (accessed on 12 December 2022).
Study | Family 1 | Conceptual | Logical | Physical | Access Query Support | MDA Use |
---|---|---|---|---|---|---|
Chebotko et al. [9] | Col | ER | Chebotko diagram | Cassandra | Yes | No |
De la Vega et al. [12] | -Doc -Col | ER | -Column metamodel-Document metamodel | Cassandra Mongo | Yes | Yes |
Shoval et al. [10] | G | ER | ER | Neo4j | No | No |
Gwendal et al. [21] | G | -ERD (for data) -OCL (for constraints) | -Graph metamodel (data) -Gremlin (constraints) | Neo4j OrientDB | No | Yes |
Ait Brahim et al. [13] | -Doc -Col -G | ER | Generic Logical Metamodel | Mongo Cassandra Neo4j | No | Yes |
Martinez-Mosquera et al. [22] | KV | ER | ER | Generic | No | Yes |
Rossel et al. [23] | KV | ER | Rossel | Generic | No | No |
m | 10 | 100 | 1000 | |
---|---|---|---|---|
n | ||||
10 | 10 | 10 | 10 | |
100 | 10 | 10 | 10 | |
1000 | 10 | 10 | 10 |
m | 10 | 100 | 1000 | |
---|---|---|---|---|
n | ||||
10 | 10 s | 1 ms | 0.1 s | |
100 | 1 ms | 0.1 s | 10 s | |
1000 | 0.1 s | 10 s | 17 min |
m | 10 | 100 | 1000 | |
---|---|---|---|---|
n | ||||
10 | 180 s | 22 ms | 3.7 s | |
100 | 35 ms | 5.1 s | 38 s | |
1000 | 19 s | 95 s | 26 min |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dourhri, A.; Hanine, M.; Ouahmane, H. KVMod—A Novel Approach to Design Key-Value NoSQL Databases. Information 2023, 14, 563. https://doi.org/10.3390/info14100563
Dourhri A, Hanine M, Ouahmane H. KVMod—A Novel Approach to Design Key-Value NoSQL Databases. Information. 2023; 14(10):563. https://doi.org/10.3390/info14100563
Chicago/Turabian StyleDourhri, Ahmed, Mohamed Hanine, and Hassan Ouahmane. 2023. "KVMod—A Novel Approach to Design Key-Value NoSQL Databases" Information 14, no. 10: 563. https://doi.org/10.3390/info14100563
APA StyleDourhri, A., Hanine, M., & Ouahmane, H. (2023). KVMod—A Novel Approach to Design Key-Value NoSQL Databases. Information, 14(10), 563. https://doi.org/10.3390/info14100563