A Model for a Serialized Set-Oriented NoSQL Database Management System
Abstract
1. Introduction
- -
- Feasibility of designing and implementing a DBMS based on set-theoretic specifications with explicit consistency semantics, along with an evaluation of system stability during incremental insertion and commitment of a medium-sized dataset;
- -
- A comparative workload performance analysis against two NoSQL systems, namely Redis and MongoDB.
2. Background and State of the Art
3. Related Work
- 1.
- Compromised statistical viability when models assume unique records but encounter duplicate information;
- 2.
- Increased computational overhead due to the need for deduplication steps during data preprocessing and ingestion;
- 3.
- Inconsistent and non-deterministic result sets and different cardinality across SQL dialects;
- 4.
- Additional deduplication logic handling, leading to downstream complexity and potential errors. It further requests functional indexes, cleaning operations, application-level logic to ensure data integrity, or additional queries.
3.1. Set-Oriented Databases vs. SQL Semantics
- -
- UNIQUE and PRIMARY KEY constraints imposed during insertion;
- -
- DISTINCT operator and GROUP BY statement to eliminate duplicates in query results;
- -
- UPSERT operators to handle conflicts during data insertion or updates, such as ON CONFLICT in PostgreSQL, MERGE in both SQL Server and Oracle.
3.2. NoSQL Systems and Data Uniqueness
4. Methods
4.1. General Architecture
4.1.1. Data Organization
4.1.2. Software Components Implementation Principles
- -
- Server side: interprets the commands received from users. The processing of the received queries entails parsing, compiling, executing, and fetching of the results. The server functions as the database engine. Parsing includes the syntactic analysis, whereas the compilation completes the semantic analysis. The temporary state of a database object is saved in a stack and memorized in the server process’s address space. At the beginning of a session, the server reads the binary datafile corresponding to a single user schema, in order to fetch the serialized dictionary with three keys associating the database’s objects. The binary datafile is written by the server process only when the user chooses to commit the data, at which point the temporary objects’ state from the main memory is permanently saved.
- -
- Client side: sends commands to the centralized database server following authentication. At this stage, the client performs basic syntactic checking to ensure that the query is properly formatted. One user interacts with the centralized database server through a GUI with multiple window layouts: authentication, registration, and terminal—a multiline graphical element from which commands can be sent.
| Listing 1. DBMS server’s main class constructor implementation and initialization in Python3. |
![]() |
| Listing 2. Handling of multiple connections using the DBMS server’s main class method exec_server. |
![]() |
4.1.3. System Trade-Offs
4.2. Database Operations
- -
- parsing;
- -
- compiling;
- -
- executing, and fetching of the results.
- 1.
- Identifying the queried values for the object of type in the serialized dictionary ;
- 2.
- Filtering the results according to the predicate P, i.e., selecting those values for which holds true.
4.2.1. Client-Server Communication
- -
- s: send message;
- -
- r: receive message;
- -
- l: load the unserialized bytestream message;
- -
- d: dump the serialized message as a bytestream;
- -
- enc: encrypt message using either the client or server public key;
- -
- dec: decrypt message using either the client or server private key;
- -
- pubClientKey, pubServerKey: client/server public key;
- -
- privClientKey, privServerKey: client/server private key.
- -
- Sending an encrypted serialized message m from client to server:s(enc(d(m), pubServerKey))
- -
- Sending an encrypted serialized message m from server to client:s(enc(d(m), pubClientKey))
- -
- Loading and decrypting the bytestream b corresponding to the serialized message m from client to server:l(dec(b, privServerKey))
- -
- Loading and decrypting the bytestream b corresponding to the serialized message m from server to client:l(dec(b, privClientKey))
- -
- Loading and decrypting the bytestream b corresponding to the received serialized message m from client to server:l(dec(r(b), privServerKey))
- -
- Loading and decrypting the bytestream b corresponding to the received serialized message m from server to client:l(dec(r(b), privClientKey))
4.2.2. Query Language Design Principles
CREATE {
COLLECTION: {
NAME: {mycollection};
};
ATTR: {
SET: {x};
};
};
SELECT {
ATTR: {
COLLECTION: {mycollection};
SET: {x};
AS: {xalias};
WHERE: {xalias > 10};
};
};
| Listing 3. Stages of the command processing pipeline in the database engine. |
![]() |
4.3. Evaluation Metrics
5. Results and Further Improvement
5.1. Concurrency Analysis and Resource Contention
5.2. Comparative Analysis of SQL Performance
- 1.
- Memory oriented:
- -
- session uga memory—tracks the total amount of memory allocated in the user global area (UGA) for the session;
- -
- session pga memory—tracks the total amount of memory allocated in the process global area (PGA) for the session;
- -
- db block gets—the number of logical reads of database blocks from the buffer cache performed during the session;
- 2.
- I/O oriented:
- -
- physical reads—the number of physical disk/secondary storage reads performed during the session;
- -
- physical writes—the number of physical disk/secondary storage writes performed during the session;
- 3.
- CPU oriented:
- -
- CPU used by this session—the total CPU time used by the session, measured in microseconds.
FACULTY {
FIRST_NAME VARCHAR2(5),
LAST_NAME VARCHAR2(5),
UNIVERSITY VARCHAR2(5) NULL,
CONSTRAINT PK_FACULTY PRIMARY KEY (FIRST_NAME, LAST_NAME),
CONSTRAINT UNQ_UNIVERSITY UNIQUE (UNIVERSITY)
}
5.3. Extension for Improved Handling of Larger Datasets
6. Future Research on Integrating the Proposed DBMS with Fuzzy Sets
Extension for Fuzzy Queries
7. Discussion
- -
- both the client and server running on the same network host;
- -
- a singular client being connected to the server at the time of the commands’ execution.
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| CDC | Change Data Capture |
| CRUD | Create, Read, Update and Delete |
| DBMS | Database Management System |
| ETL | Extract, Transform, Load |
| IP | Internet Protocol |
| NoSQL | Not Only SQL |
| RDBMS | Relational Database Management System |
| RTT | Round-trip Time |
| SQL | Structured Query Language |
| SETL | Set Language |
| TCP | Transmission Control Protocol |
| TLS | Transport Layer Security |
| UUID | Universally Unique Identifier |
Appendix A
| Nonterminal Symbols | Production Rule |
|---|---|
| <var> | ::= [a-zA-Z_][a-zA-Z0-9_]* |
| <name> | ::= ’<var>’ | 0 | [-][1-9][0-9]*[.][0-9]* |
| <expr> | ::= <var> == <name> | <var> != <name> | <var> > <name> | <var> >= <name> | <var> < <name> | <var> <= <name> |
| <set> | ::= set : {<var>}[;] |
| <collection> | ::= collection : { name : {<var>}[;]}[;] |
| <schema> | ::= schema : {<var>}[;] |
| <as> | ::= as : {<var>}[;] |
| <where> | ::= where : {<expr>}[;] |
| <value> | ::= value : {<name>}[;] |
| <create> | ::= create {<set>}[;] | create { [{ <collection>, attr : {<set>}}][;]}; |
| <select> | ::= select {attr : {[{<set>, [collection : {<var>}], [<schema>], <as>, [<where>]}]}[;]}; |
| <update> | ::= update {attr : {[{<set>, [collection : {<var>}], [<schema>], <as>, [<where>], [<value>]}]}[;]}; |
| <delete> | ::= delete {attr : {[{<set>, [collection : {<var>}], [<schema>], <as>, [<where>]}]}[;]}; |
Appendix B
| Label | Command Text |
|---|---|
| CMD1 | SELECT * FROM FACULTY WHERE UNIVERSITY IS NULL; |
| CMD2 | SELECT FIRST_NAME, LAST_NAME, COALESCE(UNIVERSITY, ’Unknown University’) AS UNIVERSITY FROM FACULTY; |
| CMD3 | SELECT FIRST_NAME, LAST_NAME, UNIVERSITY FROM FACULTY WHERE UNIVERSITY LIKE ’A%’ UNION SELECT FIRST_NAME, LAST_NAME, UNIVERSITY FROM FACULTY WHERE UNIVERSITY LIKE ’B%’; |
| CMD4 | SELECT FIRST_NAME, LAST_NAME, UNIVERSITY FROM FACULTY WHERE UNIVERSITY LIKE ’A%’ UNION ALL SELECT FIRST_NAME, LAST_NAME, UNIVERSITY FROM FACULTY WHERE UNIVERSITY LIKE ’B%’; |
| CMD5 | SELECT DISTINCT UNIVERSITY FROM FACULTY |
| CMD6 | MERGE INTO FACULTY F USING (SELECT :1 AS FIRST_NAME, :2 AS LAST_NAME, :3 AS UNIVERSITY FROM DUAL) SRC ON (F.FIRST_NAME = SRC.FIRST_NAME AND F.LAST_NAME = SRC.LAST_NAME) WHEN MATCHED THEN UPDATE SET F.UNIVERSITY = SRC.UNIVERSITY WHEN NOT MATCHED THEN INSERT (FIRST_NAME, LAST_NAME, UNIVERSITY) VALUES (SRC.FIRST_NAME, SRC.LAST_NAME, SRC.UNIVERSITY) |
| CMD7 | MERGE INTO FACULTY F USING (SELECT :1 AS FIRST_NAME, :2 AS LAST_NAME, :3 AS UNIVERSITY FROM DUAL) SRC ON (F.FIRST_NAME = SRC.FIRST_NAME AND F.LAST_NAME = SRC.LAST_NAME) WHEN MATCHED THEN UPDATE SET F.UNIVERSITY = SRC.UNIVERSITY WHEN NOT MATCHED THEN INSERT (FIRST_NAME, LAST_NAME, UNIVERSITY) VALUES (SRC.FIRST_NAME, SRC.LAST_NAME, SRC.UNIVERSITY) |
| CMD8 | SELECT * FROM FACULTY WHERE UNIVERSITY LIKE ’A%’; |
References
- Gadepally, V.; Kepner, J. Big data dimensional analysis. In Proceedings of the 2014 IEEE High Performance Extreme Computing Conference, Waltham, MA, USA, 9–11 September 2014; pp. 1–6. [Google Scholar] [CrossRef]
- Chen, M.; Chen, W.; Cai, L. Testing of big data analytics systems by benchmark. In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Västerås, Sweden, 9–13 April 2018. [Google Scholar] [CrossRef]
- Ivanov, T.; Rabl, T.; Poess, M.; Queralt, A.; Poelman, J.; Poggi, N.; Buell, J. Big Data Benchmark Compendium. In Performance Evaluation and Benchmarking: Traditional to Big Data to Internet of Things, Proceedings of the 7th TPC Technology Conference, TPCTC 2015, Kohala Coast, HI, USA, 31 August–4 September 2015; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
- Cantone, D.; Omodeo, O.; Policriti, A. Set Theory for Computing: From Decision Procedures to Declarative Programming with Sets; Springer: New York, NY, USA, 2001. [Google Scholar] [CrossRef]
- Schwartz, J.; Dewar, R.; Dubinsky, E.; Schonberg, E. Programming with Sets: An Introduction to SETL; Springer: New York, NY, USA, 1986. [Google Scholar] [CrossRef]
- Date, C. The Relational Model for Database Management Version 2—A Critical Analysis: Deconstructing RM/V2; Technics Publications: Basking Ridge, NJ, USA, 2024; Available online: https://www.isbnsearch.org/isbn/9781634624220 (accessed on 28 December 2025).
- Codd, E. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 1970, 13, 377–387. [Google Scholar] [CrossRef]
- Ricciotti, W.; Cheney, J. A Formalization of SQL with Nulls. J. Autom. Reason. 2022, 66, 989–1030. [Google Scholar] [CrossRef] [PubMed]
- Eessaar, E. Using Relational Databases in the Engineering Repository Systems. In Proceedings of the Eighth International Conference on Enterprise Information Systems—DISI, Paphos, Cyprus, 23–27 May 2006; pp. 30–37. [Google Scholar] [CrossRef][Green Version]
- Date, C.J.; Darwen, H. Foundation for Future Database Systems: The Third Manifesto, 2nd ed.; Addison-Wesley: Reading, MA, USA, 2000; Available online: https://dl.acm.org/doi/abs/10.5555/556540 (accessed on 28 December 2025).
- Silberschatz, A.; Korth, H.F.; Sudarshan, S. Database System Concepts, 6th ed.; McGraw-Hill: New York, NY, USA, 2010; Available online: https://isbnsearch.org/isbn/9780073523323 (accessed on 28 December 2025).
- Garcia-Molina, H.; Ullman, J.D.; Widom, J. Database Systems: The Complete Book, 2nd ed.; Pearson Prentice Hall: Upper Saddle River, NJ, USA, 2008; Available online: https://isbnsearch.org/isbn/9780131873254 (accessed on 28 December 2025).
- Date, C.J. An Introduction to Database Systems, 8th ed.; Pearson Education: Upper Saddle River, NJ, USA, 2004; Available online: https://isbnsearch.org/isbn/0321189566 (accessed on 28 December 2025).
- Wrembel, R. Data Integration, Cleaning, and Deduplication: Research Versus Industrial Projects. In Proceedings of the International Conference on Information Integration and Web, Virtual Event, 28–30 November 2022; pp. 3–17. [Google Scholar] [CrossRef]
- Xia, W.; Jiang, H.; Feng, D.; Douglis, F.; Shilane, P.; Hua, Y. A Comprehensive Study of the Past, Present, and Future of Data Deduplication. Proc. IEEE 2016, 104, 1681–1710. [Google Scholar] [CrossRef]
- Azeroual, O.; Jha, M.; Nikiforova, A.; Sha, K.; Alsmirat, M.; Jha, S. A Record Linkage-Based Data Deduplication Framework with DataCleaner Extension. Multimodal Technol. Interact. 2022, 6, 27. [Google Scholar] [CrossRef]
- Costa, G.; Cuzzocrea, A.; Manco, G.; Ortale, R. Data De-duplication: A Review. In Learning Structure and Schemas from Documents; Biba, M., Xhafa, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 385–412. [Google Scholar] [CrossRef]
- Cheng, P.; Gunawi, H.S. Storage Benchmarking with Deep Learning Workloads; Technical Report; University of Chicago: Chicago, IL, USA, 2021; Available online: https://newtraell.cs.uchicago.edu/files/tr_authentic/TR-2021-01.pdf (accessed on 28 December 2025).
- Muvva, S.M. Standardizing Open Table Formats for Big Data Analysis: Implications for Machine Learning and AI Applications. J. Artif. Intell. Cloud Comput. 2023, 2, 1–3. [Google Scholar] [CrossRef]
- Ardeleanu, S. Relational Database Programming: A Set-Oriented Approach, 1st ed.; Apress: Bucharest, Romania, 2016; Available online: https://isbnsearch.org/isbn/9781484220795 (accessed on 28 December 2025).
- Machireddy, J.R. Research Data Quality Management and Performance Optimization for Enterprise-Scale ETL Pipelines in Modern Analytical Ecosystems. J. Data Sci. Predict. Anal. Big Data Appl. 2023, 8, 1–26. [Google Scholar]
- Redis. Transactions. Available online: https://redis.io/docs/latest/develop/using-commands/transactions/ (accessed on 28 December 2025).
- Kvet, M. Identifying and Treating NULL Values in the Oracle Database—Performance Case Study. In Proceedings of the 33rd Conference of Open Innovations Association (FRUCT), Helsinki, Finland, 24–26 May 2023; pp. 161–168. [Google Scholar] [CrossRef]
- Irmert, F.; Daum, M.; Wegener, K.M. Modularization of Database Management Systems. In Proceedings of the 2008 EDBT Workshop on Software Engineering for Tailor-Made Data Management, SETMDM ’08, Nantes, France, 25 March 2008; pp. 40–44. [Google Scholar] [CrossRef]
- Python Software Foundation. socket—Low-level Networking Interface. Available online: https://docs.python.org/3/library/socket.html (accessed on 2 December 2025).
- Python Software Foundation. pickle—Python Object Serialization. Available online: https://docs.python.org/3/library/pickle.html (accessed on 27 December 2025).
- Boicea, A.; Rădulescu, F.; Truică, C.; Costea, C. Database encryption using asymmetric keys: A case study. In Proceedings of the 21st International Conference on Control Systems and Computer Science (CSCS), Bucharest, Romania, 29–31 May 2017. [Google Scholar] [CrossRef]
- Popeangă, D.; Mocanu, M.; Boicea, A.; Rădulescu, F.; Ciolofan, S. A Case Study On DBMS Stability Performance Evaluation. UPB Sci. Bull. Ser. C 2024, 86, 141–150. [Google Scholar]
- Python Software Foundation. time—Time Access and Conversions. Available online: https://docs.python.org/3/library/time.html (accessed on 4 December 2025).
- Zenodo. Performance Metrics for a Set-Oriented DBMS Model and System Parameters Analysis for Oracle. Available online: https://zenodo.org/records/18079195 (accessed on 28 December 2025).
- Oracle Corporation. Oracle Database Reference Release 26. Available online: https://docs.oracle.com/en/database/oracle/oracle-database/26/refrn/V-SYSSTAT.html (accessed on 28 December 2025).
- Zadeh, L.A. Fuzzy Sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
- Zadeh, L.A. Fuzzy Logic = Computing with Words. IEEE Trans. Fuzzy Syst. 1996, 4, 103–111. [Google Scholar] [CrossRef]
- Bosc, P.; Pivert, O. SQLf: A relational database language for fuzzy querying. IEEE Trans. Fuzzy Syst. 1995, 3, 1–17. [Google Scholar] [CrossRef] [PubMed]
- Min, K.; Jananthan, H.; Kepner, J. Fuzzy Relational Databases via Associative Arrays. In Proceedings of the IEEE MIT Undergraduate Research Technology Conference (URTC), Cambridge, MA, USA, 6–8 October 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Zongmin, M.; Li, Y. Data modeling and querying with fuzzy sets: A systematic survey. Fuzzy Sets Syst. 2022, 445, 147–183. [Google Scholar] [CrossRef]
- Suharjito, S. Query Optimization Using Fuzzy Logic in Integrated. Indones. J. Electr. Eng. Comput. Sci. 2016, 4, 637–642. [Google Scholar] [CrossRef]
- Sharma, P. Retrieval of Information Using Fuzzy Queries. Int. J. Eng. Tech. 2016, 2, 118–122. [Google Scholar]






| Command Text | Query Response Time [ms] | |||
|---|---|---|---|---|
| Proposed DBMS | Redis 7.0 | MongoDB 8.0 | Oracle DBMS 21c | |
| create { collection: { name: {my_collection}; }; attr: { set: {x};};}; | 9.1987 | 0.0849 | 4.8206 | 13.6306 |
| insert { attr: { value: {’string12 34# ’}; collection: { my_collection }; set: {x};};}; | 9.1232 | 0.0720 | 0.2620 | 1.4671 |
| insert { attr: { value: {-16}; collection: {my_collection}; set: {x};};}; | 9.1965 | 0.0602 | 0.2882 | 1.0991 |
| insert { attr: { value: {”}; collection: {my_collection}; set: {x};};}; | 9.0709 | 0.0532 | 0.2288 | 0.9792 |
| select { attr: { set: {x}; collection: {my_collection}; as:{x};};}; | 9.3135 | 0.0492 | 0.0050 | 3.1364 |
| delete { attr: { where: {x == ”}; as: {x}; set: {x}; collection: {my_collection};};}; | 9.5891 | 0.0523 | 0.2286 | 3.0212 |
| update { attr: { where: {x == ’string12 34#’}; value: {1234}; as: {x}; set: {x}; collection: {my_collection};};}; | 9.4857 | 0.0505 | 0.1985 | 1.5817 |
| update { attr: { where: {x == -16}; value: {16}; as: {x}; set: {x}; collection: {my_collection};};}; | 9.4697 | 0.0502 | 0.1823 | 3.1619 |
| select { attr: { set: {x}; collection: {my_collection}; as: {x}; where: {x > 100};};}; | 9.4842 | 0.0980 | 0.2065 | 2.6276 |
| delete { attr: { as: {x}; set: {x}; collection: {my_collection};};}; | 9.2470 | 0.0502 | 0.2624 | 1.2402 |
| create { set: { name: {x};};}; | 7.9932 | 0 | 4.7694 | 9.2077 |
| insert { attr: { value: {10}; set: {x};};}; | 8.0337 | 0.0502 | 0.2346 | 1.3914 |
| insert { attr: { value: {20}; set: {x};};}; | 8.0067 | 0.0476 | 0.3254 | 0.9595 |
| select { attr: { set: {x}; as: {x}; where: {x > 10};};}; | 9.4766 | 0.0553 | 0.2624 | 4.5545 |
| drop { collection: { name: {my_collection};};}; | 7.9428 | 0.0458 | 0.3260 | 23.2416 |
| commit {}; | 8.4013 | - | - | 0.6323 |
| No. of Connected Clients | Total Execution Time [s] | Throughput [Commands/s] |
|---|---|---|
| 8 | 3.3013 | 436.1918 |
| 16 | 7.4329 | 387.4665 |
| 32 | 15.9023 | 362.2117 |
| SQL Command | CPU | Memory | I/O | |||
|---|---|---|---|---|---|---|
| CPU Used by This Session [μs] | Session UGA Memory [Bytes] | Session PGA Memory [Bytes] | DB Block Gets | Physical Reads | Physical Writes | |
| Initial | 41,163 | 180,375,592 | 340,809,896 | 13,439,240 | 43,247 | 81,731 |
| CMD1 | +8 | +130,960 | +393,216 | 0 | 0 | 0 |
| CMD2 | +8 | −24 | +196,608 | 0 | 0 | 0 |
| CMD3 | +11 | 0 | −458,752 | 0 | 0 | 0 |
| CMD4 | +8 | 0 | +655,360 | 0 | 0 | 0 |
| CMD5 | +17 | +655,456 | 0 | 0 | 0 | 0 |
| CMD6 | +14 | +655,456 | +655,360 | +10 | 0 | 0 |
| CMD7 | +12 | 0 | 0 | +1 | 0 | 0 |
| CMD8 | +14 | +130,912 | +262,144 | +655 | 0 | +4 |
| Use Case | Proposed Set-Oriented DBMS Model | Extension for Fuzzy Sets |
|---|---|---|
| Clear, uniquely defined, and non-ambiguous information | ✓ | X |
| Ambiguous information | X | ✓ |
| Duplicate information | X | X |
| Transactionally consistent information | ✓ | ✓ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Șerban, A.-G.; Boicea, A. A Model for a Serialized Set-Oriented NoSQL Database Management System. Information 2026, 17, 84. https://doi.org/10.3390/info17010084
Șerban A-G, Boicea A. A Model for a Serialized Set-Oriented NoSQL Database Management System. Information. 2026; 17(1):84. https://doi.org/10.3390/info17010084
Chicago/Turabian StyleȘerban, Alexandru-George, and Alexandru Boicea. 2026. "A Model for a Serialized Set-Oriented NoSQL Database Management System" Information 17, no. 1: 84. https://doi.org/10.3390/info17010084
APA StyleȘerban, A.-G., & Boicea, A. (2026). A Model for a Serialized Set-Oriented NoSQL Database Management System. Information, 17(1), 84. https://doi.org/10.3390/info17010084




