Skip to content

Support Automatic Merging for Concurrent Container Inserts in Maps #759

@zxch3n

Description

@zxch3n

1. Problem Statement

A critical issue in CRDT-based systems, including Loro, Yjs, and Automerge, is the problem of container overwriting during concurrent operations. This occurs when two peers independently insert a new container (e.g., a List) into a Map at the same key. It's also mentioned in our docs: https://loro.dev/docs/advanced/cid#container-overwrites

The root cause of this behavior lies in the design of Container IDs. In a concurrent setting, each peer generates a unique Normal Container ID (e.g., from $peer\_id$ + $counter$) for their new container. When the operations are merged, the system treats these as two distinct insertions at the same key, and one operation ultimately overwrites the other.

This leads to a user experience that feels like data loss. Although the overwritten data still exists in the document's history, the final state only reflects the contribution of one peer, which is unintuitive and disruptive. It forces users to implement manual workarounds right now.

2. Desired Outcome

The goal is to change this behavior to align with user expectations. When container creation happens concurrently, the system should automatically merge their contents.

To be precise, consider this scenario:

  1. Peer A inserts a new List at map.key and adds item X to it.
  2. Concurrently, Peer B also inserts a new List at the same map.key and adds item Y to it.

Under the current system, the final document would contain a List with either only X or only Y. The desired outcome is a single, merged List that contains both X and Y. This behavior is more intuitive and preserves the work of all collaborators.

3. Proposed Solution

We propose a solution that leverages the flexibility of our existing Root Container ID mechanism to avoid the significant breaking changes that would come from introducing entirely new container types.

3.1. Core Strategy: Leveraging "Mergeable" Container IDs

The fundamental idea is to utilize a special type of Container ID for insertions that should be mergeable.

  • Background:
    • Normal Container ID: Generated from a $peer\_id$ + $counter$. Concurrent operations generate different IDs, leading to the overwrite problem.
    • Root Container ID: A user-defined string. We can allow concurrent operations to create the same ID.

Our strategy is to create containers that are children of a Map but use a Root Container ID-style identifier to ensure they can be merged.

3.2. Detailed Mechanism

  1. New API Introduction:
    We will introduce new methods on the LoroMap API, such as:

    • getMergeableList(key)
    • getMergeableMap(key)
    • getOrInsertMergeableList(key)
  2. Architectural Adjustment:
    A key architectural change is required: historically, Root Containers do not have a parent. In this new model, these special mergeable containers, which use a Root Container ID format, will have a parent. This allows them to exist within other containers like Map.

  3. Generating the Mergeable ID:
    When a method like insertMergeableList is called, the operation will insert a container with a Root Container ID type. To ensure concurrent operations on different peers produce the exact same ID, the ID will be deterministically generated. The proposed format for the ID string is:

    string(parent_map_container_id) + ":" + key

    Crucially, to ensure these system-generated IDs do not conflict with user-defined root container names, the string format will include special characters that are otherwise illegal for user-created Root Container IDs. This effectively creates a private namespace, allowing the system to safely distinguish between true, user-created root containers and these internal, mergeable ones.

  4. Naming and Semantics:
    Using the name Root Container ID for these child containers is semantically confusing. We should consider introducing a new internal classification, such as Mergeable Container ID, to distinguish them from true, top-level root containers.

4. Impacted Areas & Required Changes

Implementing this solution will require modifications across several parts of the system:

  • Parsing Logic: The logic for parsing Map contents must be updated to handle these parented containers with Root Container ID types.
  • Serialization (toJSON): The toJSON function and other serialization methods must be updated. Specifically, when listing top-level containers, the system must filter out and exclude these "mergeable" containers, which can be reliably identified by the special, non-user-permissible characters in their ID string.
  • Parent-Child Relationship Logic: Code that manages container parent-child relationships and checks for properties like container depth may need adjustments.
  • Snapshot Logic (toShallowSnapshot): Snapshot generation will need to be verified and potentially updated to handle these mergeable containers correctly.

5. Compatibility

This approach is designed with compatibility in mind.

  • Backward Compatibility: This solution fully guarantees backward compatibility. New clients will be able to correctly parse, import, and sync all existing documents created with older versions.

  • Forward Compatibility: The data formats remain compatible, but there are important nuances for older clients interacting with documents that use the new Mergeable Container feature:

    • State Representation: An old client loading a new document will likely fail to represent the document state correctly. A call to doc.toJSON() or the state derived from events may be incorrect or inconsistent with the state seen by a new client.
    • Data Integrity and Sync: Despite the incorrect state representation, the core data integrity is maintained. The old client can still correctly perform fundamental operations:
      • Load and export snapshots.
      • Receive updates from and send updates to other peers.

    In summary, old clients can continue to participate in the collaborative network and act as a relay for updates without data corruption, but they will not be able to correctly interpret or display the contents of these new mergeable containers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dxDev ExperienceenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions