# Synchronizing Many Filesystems in Near Linear Time

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction and Related Works

- –
- Providing the theoretical foundation for synchronizing an arbitrary number of replicas;
- –
- Developing, for the first time, a provably correct synchronization algorithm which works in linear time after an initial sort, and thus in subquadratic total running time;
- –
- Allowing asynchronous usage, namely, after requesting synchronization, the local replicas need not be locked;
- –
- Allowing for late comers when a replica can be upgraded to the synchronized state without providing the local changes;
- –
- Generalizing the traditional tree-like filesystem skeleton to arbitrary acyclic graphs, thus extending the applicability of the synchronization algorithm.

## 2. Definitions

#### 2.1. Filesystems

#### 2.2. Filesystem Commands

**Definition 1**

_{1}and f

_{2}$\in \mathbb{F}$, the command $\langle n,{\mathsf{f}}_{1},{\mathsf{f}}_{2}\rangle $ replaces 1 stored at n by the new content f

_{2}. This latter command can be considered an equivalent of edit$(n,{\mathsf{f}}_{2})$.

_{o}is the file content at the “$\mathsf{t}\mathsf{e}\mathsf{x}\mathsf{t}$” node.

#### 2.3. Command Types, Execution Order

**Definition 2**

#### 2.4. Canonical Sets and Sequences

**Definition 3**

- A does not contain null commands;
- A contains at most one command on each node;
- A is ≪-connected.

**Theorem 1**

- (a)
- If two canonical sequences share the same command set, then they are semantically equivalent. Actually, they can be transformed into each other using commutativity rules.
- (b)
- Canonical sequences are non-breaking.
- (c)
- Every non-breaking sequence α can be transformed into a canonical sequence $\alpha *\u2292\alpha $ (that is, α and $\alpha *$ have the same effect on filesystems that α does not break, but $\alpha *$ might work on more filesystems).
- (d)
- Canonical sets can be ordered to honor ≪, that is, to become canonical sequences. □

**Definition 4**

- $\{\langle \phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{a}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{b}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{c}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{d},{\mathsf{f}}_{s},\mathbb{O}\rangle ,\langle \phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{a}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{b},\mathbb{D},\mathbb{O}\rangle ,\langle \phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{a}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{b}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{c},\mathbb{D},\mathbb{O}\rangle \}$,
- $\left\{\langle \phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{a}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{b}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{c}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{d},{\mathsf{f}}_{s},\mathbb{O}\rangle ,\langle \phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{a}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{b}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{c},\mathbb{D},\mathbb{O}\rangle \right\}$, and
- $\left\{\langle \phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{a}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{b}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{c}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{d},{\mathsf{f}}_{s},\mathbb{O}\rangle \right\}$

#### 2.5. Refluent Sets

## 3. Filesystem Synchronization

#### 3.1. Update Detector

**Theorem**

**2**

#### 3.2. Synchronization

- (1)
- Every command in M is submitted by one of the replicas;
- (2)
- The canonical set M is maximal with respect to the first condition.

**Definition 5**

- ${A}_{1}=\{{\sigma}_{1},{\sigma}_{2},{\sigma}_{3}\}$, where ${\sigma}_{1}=\langle \phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{a}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{b}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{c},{\mathsf{f}}_{o},\mathbb{O}\rangle $, ${\sigma}_{2}=\langle \phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{a}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{b},\mathbb{D},\mathbb{O}\rangle $, ${\sigma}_{3}=\langle \phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{a},\mathbb{D},\mathbb{O}\rangle $;
- ${A}_{2}=\left\{\tau \right\}$, where $\tau =\langle \phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{a}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{b}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{z},\mathbb{O},{\mathsf{f}}_{z}\rangle $;
- ${A}_{3}=\{{\rho}_{1},{\rho}_{2}\}$, where ${\rho}_{1}=\langle \phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{a}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{z},\mathbb{O},{\mathsf{f}}_{u}\rangle $, and ${\rho}_{2}=\langle \phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{a}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{b}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{z},\mathbb{O},{\mathsf{f}}_{u}\rangle $.

#### 3.3. Mergers Are Applicable to the Filesystem

#### 3.4. Mergers Can Be Created in Near Linear Time

#### 3.5. Mergers Have an Operational Characterization

#### 3.6. Mergers Can Be Created via Conflict Resolution

**Definition 6.**

- (a)
- They are different commands on the same node.
- (b)
- The node of σ is above the node of τ, σ creates a non-directory and τ creates non-empty content.

#### 3.7. Mergers Support Asynchronous and Offline Synchronization

## 4. Algorithms

#### 4.1. The up Structure

**Algorithm**

**1**

Code 1. Given a lexicographically sorted sequence of commands, add the up pointers. |

#### 4.2. Checking and Ordering Canonical Sets

- If A contains a command on the node n and also on an ancestor of n, then it contains a command on the parent of n;
- If $\sigma ,\tau \in A$ are on parent–child nodes, then either $\sigma \ll \tau $ or $\tau \ll \sigma $.

**Algorithm**

**2**

Code 2. Check whether a set of commands is canonical. |

**Algorithm**

**3**

Code 3. Order a canonical command set and return a canonical sequence. |

#### 4.3. Transforming a Sequence to a Canonical Set

**Algorithm**

**4**

Code 4. Return the canonical command set that is the semantic extension of this sequence. |

#### 4.4. Generating a Merger in Near Linear Time

**Algorithm**

**5**

Code 5. Given a set of jointly refluent canonical command sets, generate a merger. |

## 5. Theory

**Claim 1.**

- (a)
- $\mathsf{\Phi}\left(n\right)=x$;
- (b)
- If σ is a destructor, then $\mathsf{\Phi}\left({n}^{\prime}\right)=\mathbb{O}$ at every node ${n}^{\prime}$ below n not mentioned in A;
- (c)
- If σ is a constructor, then $\mathsf{\Phi}\left({n}^{\prime}\right)=\mathbb{D}$ at every node ${n}^{\prime}$ above n not mentioned in A.

**Proof.**

#### 5.1. Characterizing Refluent Sets

**Claim 2.**

- (a)
- If $\sigma \in A$ and $\tau \in B$ are on the same node, then their input values are the same.
- (b)
- If $\sigma ,\tau \in A\cup B$ are on comparable nodes, then for each node ${n}^{\prime}$ between them, there is a command in $A\cup B$ on ${n}^{\prime}$.
- (c)
- Suppose $\sigma ,\tau \in A\cup B$ are on nodes $\uparrow n$ and n, respectively. If one of the sets mentions n but not $\uparrow n$, then the input of σ is $\mathbb{D}$; if one of the sets mentions $\uparrow n$ but not n, then the input of τ is $\mathbb{O}$.

**Proof.**

**Proposition 1.**

**Proof.**

**Claim 3.**

**Proof.**

**Theorem 3.**

- (a)
- All commands on node n have the same input value;
- (b)
- If m is above n and neither ${I}_{n}$ nor ${I}_{m}$ are empty, then ${I}_{\uparrow n}$ is non-empty as well;
- (c)
- If $x(\uparrow n)\ne \mathbb{D}$, then ${I}_{n}\subseteq {I}_{\uparrow n}$;
- (d)
- If $x\left(n\right)\ne \mathbb{O}$, then ${I}_{\uparrow n}\subseteq {I}_{n}$.

**Proof.**

**Algorithm**

**6**

Code 6. Given a set of canonical command sets, determine if they are jointly refluent. |

#### 5.2. Mergers by Conflict Resolution

**Proposition 2.**

**Proof.**

**Theorem 4.**

**Proof.**

**Proposition 3.**

**Proof.**

**Proposition 4.**

**Proof.**

## 6. Generating All Mergers

- (1)
- Multiple different commands on the same node with a file input value;
- (2)
- A pair of commands matching $\langle \uparrow n,\mathbb{D},\mathbb{O}\mathbb{F}\rangle $ and $\langle n,\mathbb{O},\mathbb{F}\mathbb{D}\rangle $;
- (3)
- Multiple different commands with an empty input value on the same node;
- (4)
- Multiple different commands with a directory input value on the same node.

**Algorithm**

**7**

## 7. Empirical Results

- (1)
- Deletes all existing files at $\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{i}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{u}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{k}$ for all $\mathsf{i}$ and $\mathsf{k}$;
- (2)
- Removes (the now empty) directories at $\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{i}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{u}$ for all $\mathsf{i}$;
- (3)
- For each $\mathsf{x}$ in $(\mathsf{u}-1,\mathsf{u},\mathsf{u}+1)$ modulo S and for all $\mathsf{i}$ and for all $\mathsf{j}\ne \mathsf{u}$ changes the file at $\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{i}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{j}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{x}$ (if exists) to a directory;
- (4)
- Under each newly created directory creates S new files with unique content. These files are placed at the nodes with paths $\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{i}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{j}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{x}\phantom{\rule{-0.55542pt}{0ex}}/\phantom{\rule{-0.55542pt}{0ex}}\mathsf{l}$, where $0\le \mathsf{l}<S$.

## 8. Conclusions

#### 8.1. Node Attributes

#### 8.2. Filesystems on Directed Acyclic Graphs

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Athow, D.; Turner, B. Best File Syncing Solutions of 2023. 2023. Available online: https://www.techradar.com/best/best-file-syncing-solution (accessed on 10 May 2023).
- Mościcki, J.T.; Mascetti, L. Cloud storage services for file synchronization and sharing in science, education and research. Future Gener. Comput. Syst.
**2018**, 78, 1052–1054. [Google Scholar] [CrossRef] - Sun, C.; Jia, X.; Zhang, Y.; Yang, Y.; Chen, D. Achieving Convergence, Causality Preservation, and Intention Preservation in Real-Time Cooperative Editing Systems. ACM Trans. Comput. Hum. Interact.
**1998**, 5, 63–108. [Google Scholar] [CrossRef] - Sun, C.; Ellis, C.A. Operational Transformation in Real-Time Group Editors: Issues, Algorithms, and Achievements. In Proceedings of the ACM 1998 Conference on Computer Supported Cooperative Work, Seattle, WA, USA, 14–18 November 1998; Poltrock, S.E., Grudin, J., Eds.; ACM: New York, NY, USA, 1998; pp. 59–68. [Google Scholar] [CrossRef]
- Shao, B.; Li, D.; Lu, T.; Gu, N. An Operational Transformation Based Synchronization Protocol for Web 2.0 Applications. In Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work, Hangzhou, China, 19–23 March 2011; CSCW ’11. Association for Computing Machinery: New York, NY, USA, 2011; pp. 563–572. [Google Scholar] [CrossRef]
- Ng, A.; Sun, C. Operational Transformation for Real-time Synchronization of Shared Workspace in Cloud Storage. In Proceedings of the 19th International Conference on Supporting Group Work, Sanibel Island, FL, USA, 13–16 November 2016; Lukosch, S.G., Sarcevic, A., Lewkowicz, M., Muller, M.J., Eds.; ACM: New York, NY, USA, 2016; pp. 61–70. [Google Scholar] [CrossRef]
- Day-Richter, J. What’s Different about the New Google Docs: Making Collaboration Fast. 2010. Available online: https://drive.googleblog.com/2010/09/whats-different-about-new-google-docs.html (accessed on 12 January 2023).
- Shapiro, M.; Preguiça, N.M.; Baquero, C.; Zawirski, M. Conflict-Free Replicated Data Types. In Proceedings of the Stabilization, Safety, and Security of Distributed Systems-13th International Symposium, SSS 2011, Grenoble, France, 10–12 October 2011; Défago, X., Petit, F., Villain, V., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6976, pp. 386–400. [Google Scholar] [CrossRef]
- Preguiça, N.M. Conflict-free Replicated Data Types: An Overview. arXiv
**2018**, arXiv:1806.10254. [Google Scholar] - Tao, V.; Shapiro, M.; Rancurel, V. Merging semantics for conflict updates in geo-distributed file systems. In Proceedings of the 8th ACM International Systems and Storage Conference, SYSTOR 2015, Haifa, Israel, 26–28 May 2015; Naor, D., Heiser, G., Keidar, I., Eds.; ACM: New York, NY, USA, 2015; pp. 1–12. [Google Scholar] [CrossRef]
- Liu, E. A CRDT-Based File Synchronization System. Master’s Thesis, Norwegian University of Science and Technology, Trondheim, Norway, 2021. [Google Scholar]
- Shekow, M. Syncpal: A Simple and Iterative Reconciliation Algorithm for File Synchronizers. Ph.D. Thesis, RWTH Aachen University, Aachen, Germany, 2019. [Google Scholar]
- Csirmaz, E.P. Algebraic File Synchronization: Adequacy and Completeness. arXiv
**2016**, arXiv:1601.01736. [Google Scholar] - Csirmaz, E.P.; Csirmaz, L. Data Synchronization: A Complete Theoretical Solution for Filesystems. Future Internet
**2022**, 14, 344. [Google Scholar] [CrossRef] - Shapiro, M.; Preguiça, N.; Baquero, C.; Zawirski, M. A Comprehensive Study of Convergent and Commutative Replicated Data Types; Technical Report 7506; INRIA, Inria-Centre Paris-Rocquencourt: Le Chesnay-Rocquencourt, France, 2011. [Google Scholar]
- Knuth, D.E. The Art of Computer Programming, Vol. 1: Fundamental Algorithms, 3rd ed.; Addison-Wesley: Boston, MA, USA, 1997. [Google Scholar]
- Balasubramaniam, S.; Pierce, B.C. What is a File Synchronizer? In Proceedings of the MOBICOM ’98, The Fourth Annual ACM/IEEE International Conference on Mobile Computing and Networking, Dallas, TX, USA, 25–30 October 1998; Osborne, W.P., Moghe, D.B., Eds.; ACM: New York, NY, USA, 1998; pp. 98–108. [Google Scholar] [CrossRef]
- Tridgell, A.; Mackerras, P. The Rsync Algorithm; Australian National University: Canberra, Australia, 1996. [Google Scholar]
- Boškov, N.; Trachtenberg, A.; Starobinski, D. Enabling Cost-Benefit Analysis of Data Sync Protocols. arXiv
**2023**, arXiv:2303.17530. [Google Scholar] - Preguiça, N.; Marques, J.M.; Shapiro, M.; Letia, M. A Commutative Replicated Data Type for Cooperative Editing. In Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems, Montreal, QC, Canada, 22–26 June 2009; ICDCS ’09. IEEE Computer Society: Washington, DC, USA, 2009; pp. 395–403. [Google Scholar] [CrossRef]
- Antkiewicz, M.; Czarnecki, K. Design Space of Heterogeneous Synchronization. In Proceedings of the Generative and Transformational Techniques in Software Engineering II: International Summer School, GTTSE 2007, Braga, Portugal, 2–7 July 2007; Revised Papers. Lämmel, R., Visser, J., Saraiva, J., Eds.; Springer: Berlin, Heidelberg, 2008; pp. 3–46. [Google Scholar] [CrossRef]
- Li, Z.; Wilson, C.; Jiang, Z.; Liu, Y.; Zhao, B.Y.; Jin, C.; Zhang, Z.L.; Dai, Y. Efficient Batched Synchronization in Dropbox-Like Cloud Storage Services. In Proceedings of the Middleware 2013, Beijing, China, 9–13 December 2013; Eyers, D., Schwan, K., Eds.; Springer: Berlin, Heidelberg, 2013; pp. 307–327. [Google Scholar]
- Petroni, A.; Cuomo, F.; Schepis, L.; Biagi, M.; Listanti, M.; Scarano, G. Adaptive Data Synchronization Algorithm for IoT-Oriented Low-Power Wide-Area Networks. Sensors
**2018**, 18, 4053. [Google Scholar] [CrossRef] [PubMed] - Feng, J.; Qiao, X.; Li, Y. The research of synchronization and consistency of data in mobile environment. In Proceedings of the 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems, Hangzhou, China, 30 October–1 November 2012; Volume 2, pp. 869–874. [Google Scholar] [CrossRef]
- Klophaus, R. Riak Core: Building Distributed Applications without Shared State. In Proceedings of the ACM SIGPLAN Commercial Users of Functional Programming, Baltimore, MD, USA, 1–2 October 2010; CUFP ’10. Association for Computing Machinery: New York, NY, USA, 2010. [Google Scholar] [CrossRef]
- Qian, Y. Data Synchronization and Browsing for Home Environments. Ph.D. Thesis, Technische Universiteit Eindhoven, Eindhoven, The Netherlands, 2004. [Google Scholar] [CrossRef]
- Zhang, Y.; Dragga, C.; Arpaci-Dusseau, A.; Arpaci-Dusseau, R. *-Box: Towards Reliability and Consistency in Dropbox-like File Synchronization Services. In Proceedings of the 5th USENIX Conference on Hot Topics in Storage and File Systems, Renton, WA, USA, 8–9 July 2019; HotStorage’13. USENIX Association: Berkeley, CA, USA, 2013; p. 2. [Google Scholar]
- Even, S. Graph Algorithms, 2nd ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar] [CrossRef]

**Figure 1.**The synchronization cycle. Identical copies of the same filesystem are edited independently. Each replica sends the locally created update information to the synchronizer, which returns the commands to be executed on the local copy to update it to a common synchronized state.

**Figure 2.**Asynchronous synchronization. Top line: after the synchronization request has been sent, additional local modifications are made to the filesystem. When receiving the synchronization commands, they are modified using the current state of the local filesystem. Bottom line: the end result should be the same as applying the synchronization commands immediately and then making the local modifications afterwards.

**Figure 3.**Running time for generating the synchronized state on several synthetic data sets. The running time depends only on the total number of filesystem commands (x axis), and not on the number of replicas (color).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Csirmaz, E.P.; Csirmaz, L.
Synchronizing Many Filesystems in Near Linear Time. *Future Internet* **2023**, *15*, 198.
https://doi.org/10.3390/fi15060198

**AMA Style**

Csirmaz EP, Csirmaz L.
Synchronizing Many Filesystems in Near Linear Time. *Future Internet*. 2023; 15(6):198.
https://doi.org/10.3390/fi15060198

**Chicago/Turabian Style**

Csirmaz, Elod P., and Laszlo Csirmaz.
2023. "Synchronizing Many Filesystems in Near Linear Time" *Future Internet* 15, no. 6: 198.
https://doi.org/10.3390/fi15060198