# Data Synchronization: A Complete Theoretical Solution for Filesystems

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction and Related Work

#### Our Contribution

## 2. Methodology

- R1
- intention-confined effect: operations applied to the replicas by the synchronizer must be based on operations generated by the end-user; and
- R2
- aggressive effect preservation: the effect of compatible operations should be preserved fully; and the effect of conflicting operations should be preserved as much as possible.

- C0
- $\mu $ is applicable to the original filesystem $\Phi $,
- C1
- every command in $\mu $ can be found either in $\alpha $, in $\beta $, or in both, and
- C2
- $\mu $ is maximal, i.e., no canonical sequence adding more commands to $\mu $ can satisfy both C0 and C1.

- 1.
- rolling back some of the commands in $\alpha $, followed by (*)
- 2.
- applying some commands from the other sequence $\beta $,

## 3. Definitions

#### 3.1. Namespace and Filesystems

#### 3.2. Internal Filesystem Commands

- $n\in \mathbb{N}$ is the node on which the command acts,
- x is the content at node n before the command is executed (precondition), and
- y is the content at node n after the command was executed.

_{1}and ${\mathrm{f}}_{2}\in \mathbb{F}$ the command $\langle n,{\mathrm{f}}_{1},{\mathrm{f}}_{2}\rangle $ replaces f

_{1}stored at node n by the new content f

_{2}. This latter command can be considered to be an equivalent of edit$(n,{\mathrm{f}}_{1},{\mathrm{f}}_{2})$.

- $\Phi $ contains x at the node n, that is, $\Phi \left(n\right)=x$, and
- after changing the content at n to y the filesystem still has the tree property.

#### 3.3. Command Types and Execution Order

**Definition 1**

## 4. Canonical Sequences and Sets

**Proposition 1**

- (a)
- $\sigma \tau \u2291\omega $ for some $\omega \in \Omega $ also on the same node n,
- (b)
- $\sigma \tau \equiv \u03f5$, that is, the pair breaks every filesystem.

**Proof.**

**Proposition 2**

- (a)
- if $\sigma \phantom{\rule{0.0pt}{0ex}}\Vert \phantom{\rule{0.0pt}{0ex}}\tau $ then $\sigma \tau \equiv \tau \sigma $, and
- (b)
- if $\sigma \phantom{\rule{0.0pt}{0ex}}\nparallel \phantom{\rule{0.0pt}{0ex}}\tau $ then $\sigma \tau \neg \equiv \u03f5$ if and only if $\sigma \ll \tau $.

**Proof.**

#### 4.1. Canonical Sequences

**Definition 2**

- (a)
- it does not contain null commands, that is, a command of the form $\langle n,x,x\rangle $,
- (b)
- no two commands in the sequence are on the same node, and
- (c)
- it is non-breaking.

**Proposition 3**

**Proof.**

**Definition 3.**

- (a)
- α honors ≪, if for commands $\sigma ,\tau \in \alpha $, σ precedes τ whenever $\sigma \ll \tau $.
- (b)
- α is≪-connected, if for any two commands $\sigma ,\tau \in \alpha $, either $\sigma \phantom{\rule{0.0pt}{0ex}}\Vert \phantom{\rule{0.0pt}{0ex}}\tau $, or σ and τ are connected by an ≪-chain (see Definition 1) whose elements are in α.

**Theorem 1**

- (a)
- it does not contain null commands,
- (b)
- no two commands in the sequence are on the same node,
- (c1)
- it honors ≪, and
- (c2)
- it is ≪-connected.

**Proof.**

#### 4.2. Canonical Sets

**Proposition 4**

**Proof.**

**Definition 4**

- (a)
- it does not contains null commands,
- (b)
- no two commands in A are on the same node, and
- (c2)
- the set A is ≪-connected, meaning that if two of its commands are on comparable nodes, then they are connected by an ≪-chain whose elements are in A.

**Definition 5.**

**Proposition 5.**

**Proof.**

#### 4.3. The Update Detector

**Proposition 6**

**Proof.**

**Proposition 7**

**Proof.**

## 5. The Reconciler—Synchronizing Two Replicas

**Definition 6**

**Theorem 2.**

- (a)
- $MA\circ {A}_{1}^{-1}\circ {B}^{\prime}$, where ${A}_{1}=A\setminus M\subseteq A$ and ${B}^{\prime}=M\setminus A\subseteq B$,
- (b)
- $MB\circ {B}_{1}^{-1}\circ {A}^{\prime}$ where ${B}_{1}=B\setminus M\subseteq B$ and ${A}^{\prime}=M\setminus B\subseteq A$,
- (c)
- if both A and B are applicable to a filesystem, then so are M, $A\circ {A}_{1}^{-1}\circ {B}^{\prime}$ and $B\circ {B}_{1}^{-1}\circ {A}^{\prime}$.

**Proposition 8.**

**Proof.**

**Proposition 9.**

**Proof.**

**Proposition 10.**

**Proof.**

**Proposition 11.**

**Proof.**

## 6. Synchronization by Conflict Resolution

- discard both commands,
- discard one command and keep the other (loser/winner paradigm), or
- replace one or both commands by a new one.

#### 6.1. Theoretical Foundation

**Proposition 12.**

**Proof.**

**Definition 7.**

**Proposition 13.**

**Proof.**

**Proposition 14.**

**Proof.**

**Proposition 15.**

**Proof.**

**Corollary 1**

**Proof.**

#### 6.2. The Synchronization Algorithm

**Algorithm 1**(Synchronization in the general case)

**Correctness of Algorithm 1.**

**Complexity of Algorithm 1.**

- 1.
- $A\u22d0{A}^{o}$, $B\u22d0{B}^{o}$, consequently A and B are refluent canonical subsets of ${A}^{o}$ and ${B}^{o}$;
- 2.
- all mergers of A and B are mergers of ${A}^{o}$ and ${B}^{o}$;
- 3.
- if ${M}^{o}$ is specified, then ${M}^{o}$ is a merger of A and B.

**Algorithm 2**(Synchronization of disjunct sets).

- Iterate:
- Choose $\sigma \in A$ and $\tau \in B$ such that $(\sigma ,\tau )$ is a conflict. If no such pair exists, go to Finish. If ${M}^{o}$ is given, choose a conflict so that either $\sigma $ or $\tau $ is in ${M}^{o}$.Conflict resolution: Out of $\sigma $ and $\tau $, choose the winner and the loser commands. If ${M}^{o}$ is given, let the winner be the one in ${M}^{o}$. If $\sigma $ is the winner, delete all commands from B which are in conflict with $\sigma $ (including $\tau $). If $\tau $ is the winner, delete all commands from A which are in conflict with $\tau $. Go back to Iterate.
- Finish:
- Return $A\cup B$ as the merger.

**Correctness of Algorithm 2.**

**Complexity of Algorithm 2.**

#### 6.3. Structural Properties of the Conflict Graph

## 7. Conclusions

#### 7.1. Synchronizing More Than Two Replicas

#### 7.2. Efficiency of the Algorithms

#### 7.3. Attributes

#### 7.4. Links

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

CRDT | Conflict-free Replicated Data Type |

CSCW | Computer Supported Collaborative Work |

OT | Operational Transformation |

$\mathbb{O}$, $\mathbb{F}$, $\mathbb{D}$ | empty, file, and directory content |

$\Phi $, $\Psi $ | filesystem |

## References

- Preguiça, N. Conflict-free replicated data types: An overview. arXiv
**2018**, arXiv:1806.10254. [Google Scholar] - Shapiro, M.; Preguiça, N.; Baquero, C.; Zawirski, M. Conflict-free replicated data types. In Stabilization, Safety, and Security of Distributed Systems; Springer: Berlin/Heidelberg, Germany, 2011; pp. 386–400. [Google Scholar]
- Sun, C.; Jia, X.; Zhang, Y.; Yang, Y.; Chen, D. Achieving convergence, causality preservation, and intention preservation in real-time cooperative editing systems. ACM Trans. Comput. Hum. Interact.
**1998**, 5, 63–108. [Google Scholar] [CrossRef] - Sun, C.; Ellis, C. Operational transformation in real-time group editors. In Proceedings of the Computer Supported Cooperative Work Seattle, Washington, DC, USA, 14–18 November 1998; pp. 59–68. [Google Scholar]
- Ellis, C.A.; Gibbs, S.J. Concurrency control in groupware systems. In Proceedings of the SIGMOID conference on Management of Data, Portland, OR, USA, 31 May–2 June 1989; pp. 399–407. [Google Scholar]
- Day-Richter, J. What’s Different about the New Google Docs: Making Collaboration Fast. Available online: https://drive.googleblog.com/2010/09/whats-different-about-new-google-docs.html (accessed on 9 September 2022).
- Nicolaescu, P.; Jahns, K.; Derntl, M.; Klamma, R. Near real-time peer-to-peer shared editing on extensible data types. In Proceedings of the 19th International Conference on Supporting Group Work, Sanibel Island, FL, USA, 13–16 November 2016; pp. 39–49. [Google Scholar] [CrossRef]
- Klophaus, R. Riak Core: Building distributed applications without shared state. In Proceedings of the SIGPLAN Commercial Users of Functional Programming (CUFP ’10), Baltimore, MD, USA, 1–2 October 2010. Article 14. [Google Scholar] [CrossRef]
- Ng, A.; Sun, C. Operational transformation for real-time synchronization of shared workspace in cloud storage. In Proceedings of the 19th International Conference on Supporting Group Work, Sanibel Island, FL, USA, 13–16 November 2016; pp. 61–70. [Google Scholar]
- Tao, V.; Shapiro, M.; Rancurel, V. Merging semantics for conflict updates in geo-distributed file systems. In Proceedings of the 15th ACM International Systems and Storage Conference, Haifa, Israel, 13–15 June 2015. [Google Scholar]
- Balasubramaniam, S.; Pierce, B.C. What is a File Synchronizer? In Proceedings of the 4th Annual ACM/IEEE International Conference on Mobile Computing and Networking, Dallas, TX, USA, 25–30 October 1998; pp. 98–108. [Google Scholar]
- Csirmaz, E.P.; Csirmaz, L. Algebra of Data Reconciliation. Stud. Sci. Math. Hung.
**2022**. [Google Scholar] [CrossRef] - Kermarrec, A.; Rowstron, A.; Shapiro, M.; Druschel, P. The IceCube approach to the reconciliation of divergent replicas. In Proceedings of the ACM Symposium on principles of distributed computing 2001, Newport, RI, USA, 25–27 June 2001; pp. 210–218. [Google Scholar]
- Martins, V.; Pacitti, E.; Valduriez, P. Distributed semantic reconciliation of replicated data. In Proceedings of the CDUR, Paris, France, 2–4 November 2005; pp. 48–53. [Google Scholar]
- Pierce, B.C.; Vouillon, J. What’s in Unison? A Formal Specification and Reference Implementation of a File Synchronizer. U. of Pennsylvania Technical Reports (CIS) 40. 2004. Available online: http://repository.upenn.edu/cis_reports/40 (accessed on 9 September 2022).
- Shekow, M. Syncpal: A Simple and Iterative Reconciliation Algorithm for File Synchronizers. Ph.D. Thesis, Aachen University, Aachen, Germany, 2019. [Google Scholar]
- Terry, D.B.; Theimer, M.M.; Petersen, K.; Demers, A.J.; Spreitzer, M.J.; Hauser, C.H. Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, Mountain, CO, USA, 3–6 December 1995; pp. 172–182. [Google Scholar]
- Csirmaz, E.P. Algebraic File Synchronization: Adequacy and Completeness. arXiv
**2016**, arXiv:1601.01736. [Google Scholar] - Antkiewicz, M.; Czarnecki, K. Design space of heterogeneous synchronization. In Proceedings of the GTTSE 2007: International Summerschool on Generative and Transformational Techniques in Software Engineering 2007, Braga, Portugal, 2–7 July 2007; Springer: Berlin/Heidelberg, Germany, 2008; pp. 3–46. [Google Scholar]
- Preguiça, N.; Marques, J.M.; Shapiro, M.; Letia, M. A commutative replicated data type for cooperative editing. In Proceedings of the 2009 29th International Conference on Distributed Computing Systems, Montreal, QC, Canada, 23–26 June 2009; pp. 395–403. [Google Scholar]

**Figure 1.**Outline of the synchronization process. Identical replicas of the original filesystem $\Phi $ are updated (modified) yielding the divergent replicas ${\Phi}_{1}$ and ${\Phi}_{2}$. The reconciler uses the update information $\alpha $ and $\beta $ extracted by the update detectors, and generates the synchronizing instructions ${\alpha}_{1}$ and ${\beta}_{1}$. These create the identical merged state $\Psi $ when applied to the replicas. The update detectors determine the update information $\alpha $ and $\beta $ either by comparing the different states of the replicas (e.g., ${\Phi}_{1}$ vs. $\Phi $), or by having access to the update instructions ${\alpha}_{0}$ and ${\beta}_{0}$ that were applied to $\Phi $.

**Figure 2.**Structure of an up (left) and down (right) ≪-chain. Black arrows point to the parent node and red arrows show the execution order ≪.

**Figure 3.**Illustrating the update detector. The nodes are, from left to right, ${n}_{1}$ to ${n}_{5}$ in the top row, and ${n}_{6}$ to ${n}_{9}$ in the bottom row. The first replica deletes all directories, the second replica stores file content f

_{i}at node ${n}_{i}$.

**Figure 4.**Three possible results of synchronizing the filesystems in Figure 3.

**Figure 5.**(

**a**) The conflict graph of the synchronization problem of Figure 3. The conflict $({\sigma}_{2},{\tau}_{7})$ is resolved with the winner ${\tau}_{7}$. (

**b**) The conflict graph after resolving $({\sigma}_{2},{\tau}_{7})$; commands ${\sigma}_{1}$ and ${\sigma}_{2}$ are deleted. The next conflict we resolve is $({\sigma}_{4},{\tau}_{5})$. (

**c**) The conflict graph after resolving $({\sigma}_{4},{\tau}_{5})$ with the winner ${\sigma}_{4}$; commands ${\tau}_{9}$, ${\tau}_{5}$ are deleted. The final conflict graph contains the commands ${\sigma}_{4}$, ${\sigma}_{5}$, ${\tau}_{6}$, ${\tau}_{7}$, ${\tau}_{8}$, and no edges.

**Figure 6.**Outline of Algorithm 1. Given the input $({A}^{o},{B}^{o},{M}^{o})$, extract the common part C of ${A}^{o}$ and ${B}^{o}$, and call Algorithm 2 with the reduced sets. Finally, add C to the returned ${M}^{\prime}$ to generate the output M.

**Figure 7.**Outline of Algorithm 2. The Initialize step sets A and B. The Iterate step is executed until no more conflicts are found. Conflicts are resolved using the hint from the additional input ${M}^{o}$, then the command sets A and B are reduced. If no conflict remains, the merger $M=A\cup B$ is returned by the Finalize step.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Csirmaz, E.P.; Csirmaz, L.
Data Synchronization: A Complete Theoretical Solution for Filesystems. *Future Internet* **2022**, *14*, 344.
https://doi.org/10.3390/fi14110344

**AMA Style**

Csirmaz EP, Csirmaz L.
Data Synchronization: A Complete Theoretical Solution for Filesystems. *Future Internet*. 2022; 14(11):344.
https://doi.org/10.3390/fi14110344

**Chicago/Turabian Style**

Csirmaz, Elod P., and Laszlo Csirmaz.
2022. "Data Synchronization: A Complete Theoretical Solution for Filesystems" *Future Internet* 14, no. 11: 344.
https://doi.org/10.3390/fi14110344