-
-
Notifications
You must be signed in to change notification settings - Fork 116
Description
1. Problem Statement
A critical issue in CRDT-based systems, including Loro, Yjs, and Automerge, is the problem of container overwriting during concurrent operations. This occurs when two peers independently insert a new container (e.g., a List) into a Map at the same key. It's also mentioned in our docs: https://loro.dev/docs/advanced/cid#container-overwrites
The root cause of this behavior lies in the design of Container IDs. In a concurrent setting, each peer generates a unique Normal Container ID (e.g., from $peer\_id$ + $counter$) for their new container. When the operations are merged, the system treats these as two distinct insertions at the same key, and one operation ultimately overwrites the other.
This leads to a user experience that feels like data loss. Although the overwritten data still exists in the document's history, the final state only reflects the contribution of one peer, which is unintuitive and disruptive. It forces users to implement manual workarounds right now.
2. Desired Outcome
The goal is to change this behavior to align with user expectations. When container creation happens concurrently, the system should automatically merge their contents.
To be precise, consider this scenario:
- Peer A inserts a new
Listatmap.keyand adds itemXto it. - Concurrently, Peer B also inserts a new
Listat the samemap.keyand adds itemYto it.
Under the current system, the final document would contain a List with either only X or only Y. The desired outcome is a single, merged List that contains both X and Y. This behavior is more intuitive and preserves the work of all collaborators.
3. Proposed Solution
We propose a solution that leverages the flexibility of our existing Root Container ID mechanism to avoid the significant breaking changes that would come from introducing entirely new container types.
3.1. Core Strategy: Leveraging "Mergeable" Container IDs
The fundamental idea is to utilize a special type of Container ID for insertions that should be mergeable.
- Background:
Normal Container ID: Generated from a$peer\_id$ + $counter$. Concurrent operations generate different IDs, leading to the overwrite problem.Root Container ID: A user-defined string. We can allow concurrent operations to create the same ID.
Our strategy is to create containers that are children of a Map but use a Root Container ID-style identifier to ensure they can be merged.
3.2. Detailed Mechanism
-
New API Introduction:
We will introduce new methods on theLoroMapAPI, such as:getMergeableList(key)getMergeableMap(key)getOrInsertMergeableList(key)
-
Architectural Adjustment:
A key architectural change is required: historically,Root Containersdo not have a parent. In this new model, these special mergeable containers, which use aRoot Container IDformat, will have a parent. This allows them to exist within other containers likeMap. -
Generating the Mergeable ID:
When a method likeinsertMergeableListis called, the operation will insert a container with aRoot Container IDtype. To ensure concurrent operations on different peers produce the exact same ID, the ID will be deterministically generated. The proposed format for the ID string is:string(parent_map_container_id) + ":" + keyCrucially, to ensure these system-generated IDs do not conflict with user-defined root container names, the string format will include special characters that are otherwise illegal for user-created
Root Container IDs. This effectively creates a private namespace, allowing the system to safely distinguish between true, user-created root containers and these internal, mergeable ones. -
Naming and Semantics:
Using the nameRoot Container IDfor these child containers is semantically confusing. We should consider introducing a new internal classification, such asMergeable Container ID, to distinguish them from true, top-level root containers.
4. Impacted Areas & Required Changes
Implementing this solution will require modifications across several parts of the system:
- Parsing Logic: The logic for parsing
Mapcontents must be updated to handle these parented containers withRoot Container IDtypes. - Serialization (
toJSON): ThetoJSONfunction and other serialization methods must be updated. Specifically, when listing top-level containers, the system must filter out and exclude these "mergeable" containers, which can be reliably identified by the special, non-user-permissible characters in their ID string. - Parent-Child Relationship Logic: Code that manages container parent-child relationships and checks for properties like container depth may need adjustments.
- Snapshot Logic (
toShallowSnapshot): Snapshot generation will need to be verified and potentially updated to handle these mergeable containers correctly.
5. Compatibility
This approach is designed with compatibility in mind.
-
Backward Compatibility: This solution fully guarantees backward compatibility. New clients will be able to correctly parse, import, and sync all existing documents created with older versions.
-
Forward Compatibility: The data formats remain compatible, but there are important nuances for older clients interacting with documents that use the new
Mergeable Containerfeature:- State Representation: An old client loading a new document will likely fail to represent the document state correctly. A call to
doc.toJSON()or the state derived from events may be incorrect or inconsistent with the state seen by a new client. - Data Integrity and Sync: Despite the incorrect state representation, the core data integrity is maintained. The old client can still correctly perform fundamental operations:
- Load and export snapshots.
- Receive updates from and send updates to other peers.
In summary, old clients can continue to participate in the collaborative network and act as a relay for updates without data corruption, but they will not be able to correctly interpret or display the contents of these new mergeable containers.
- State Representation: An old client loading a new document will likely fail to represent the document state correctly. A call to