Async Replication
Introduced to GA in the 1.29 release, Async Replication is a mechanism used to ensure eventual consistency across nodes in a distributed cluster. It works as a background process that automatically keeps nodes in sync without requiring user queries. Previously, consistency was achieved through "read repair" which involved nodes comparing data during a read request and exchanging missing or outdated information. This approach guarantees eventual consistency without requiring read operations.
This applies solely to data objects, as metadata consistency is treated differently (through RAFT consensus).
Under the Hood
- Async replication operates as a background process either per tenant (in a multi-tenant collection) or per shard (in a non-multi-tenant collection).
- It is disabled by default but can be enabled through collection configuration changes, similar to setting the replication factor.
Environment Variable Deep Dive
These environment variables can be used to fine-tune behavior for your specific use case or deployment environment.
The optimal values for these variables will ultimately depend on factors like: data size, network conditions, write patterns, and the desired level of eventual consistency.
Use Cases
General
Feature Control
ASYNC_REPLICATION_DISABLED
Globally disables the entire async replication feature.
- Its default value is
false
. - Use case: This is useful when you have many tenants or collections where a temporary global disable is needed, like during debugging or critical maintenance.
- Special Considerations:
- This overrides any collection configuration.
Replication Control
ASYNC_REPLICATION_PROPAGATION_LIMIT
Defines the maximum number of objects that will be propagated in a single async replication iteration (after one hash tree comparison).
- By default is set to 10,000.
- Use Case(s): Can be adjusted based on network capacity and the desired rate of convergence.
- Considerations: Even if more than this number of differences are detected, only this many objects will be propagated in the current iteration. Subsequent iterations will handle the remaining differences.
ASYNC_REPLICATION_PROPAGATION_DELAY
Introduces a delay before considering an object for propagation. Only objects older than this delay are considered.
- By default it is set to 30 seconds.
- Use Case(s): If an object is inserted into one node but the insertion is still in progress, the hash comparison might detect it. This delay prevents the async replication from trying to propagate it before the local write operation is fully complete.
- Considerations: This should be set based on the typical write latency of the system.
Operational Visibility
ASYNC_REPLICATION_LOGGING_FREQUENCY
Controls how often the background async replication process logs its activity.
- By default it is set to 5 seconds.
- Use Case(s): Increasing the frequency provides more detailed logs, while decreasing it reduces log verbosity.
Performance Tuning
Memory Optimization
ASYNC_REPLICATION_HASHTREE_HEIGHT
Customizes the height of the hash tree built by each node to represent its locally stored data.
- By default the value is set to 16 which is roughly 2MB of RAM per shard on each node.
- Use case(s):
- In multi-tenant setups with a large number of tenants, reducing the hash tree would minimize the memory footprint.
- For very large collections, a larger hash tree could be more beneficial for more efficient identification of differing data ranges.
- Special Considerations:
- Modification of the hash tree height requires rebuilding the hash tree on each node, which involves iterating over all existing objects.
Throughput and Concurrency
ASYNC_REPLICATION_PROPAGATION_CONCURRENCY
Controls the number of concurrent goroutines (or threads) used to send batches of objects during the propagation phase.
- By default it is set to 5.
- Considerations: Increasing concurrency can improve propagation speed, but needs to be balanced with potential resource contention (CPU, network).
Batch Processing
ASYNC_REPLICATION_DIFF_BATCH_SIZE
Sets the number of object metadata fetched per request during the comparison phase.
- By default it is set to 1000.
- Use Case(s): May be increased to potentially improve performance if network latency is low and nodes can handle larger requests.
- Considerations: Fetching metadata in batches optimizes network communication.
ASYNC_REPLICATION_PROPAGATION_BATCH_SIZE
Sets the maximum number of objects included in each batch when propagating data to a remote node.
- By default is set to 100.
- Use Case(s):
- For large objects, reducing the batch size can help manage memory usage during propagation. The batch size could be similar to the batch size used during initial data insertion.
- For smaller objects, increasing the batch size might improve propagation efficiency by reducing the overhead of individual requests, but needs to be balanced with potential memory pressure.
- Considerations: This setting is particularly important for large objects, as larger batches can lead to higher memory consumption during transmission. Multiple batches may be sent within a single iteration to reach the
ASYNC_REPLICATION_PROPAGATION_LIMIT
.
Consistency Tuning
Synchronization Frequency
ASYNC_REPLICATION_FREQUENCY
Defines how often each node initiates the process of comparing its local data (via the hash tree) with other nodes storing the same shard. This regularly checks for inconsistencies, even if no changes have been explicitly triggered.
- It's default value is 30 seconds.
- Use Case(s)
- Decreasing the frequency can be beneficial for applications that require faster convergence to eventual consistency.
- Increasing the frequency can be beneficial for reducing the load on the system by relaxing the eventual consistency.
ASYNC_REPLICATION_FREQUENCY_WHILE_PROPAGATING
Defines a shorter frequency for subsequent comparison and propagation attempts when a previous propagation cycle did not complete (i.e., not all detected differences were synchronized).
- By default it is set to 20 milliseconds.
- Use Case(s): When inconsistencies are known to exist, this expedites the synchronization process.
- Considerations: This is activated after a propagation cycle detects differences but does not propagate all of them due to limits.
Node Status Monitoring
ASYNC_REPLICATION_ALIVE_NODES_CHECKING_FREQUENCY
Defines the frequency at which the system checks for changes in the availability of nodes within the cluster.
- By default it is set to 5 seconds.
- Use Case(s): When a node rejoins the cluster after a period of downtime, it is highly likely to be out of sync. This setting ensures that the replication process is initiated promptly.
Timeout Management
ASYNC_REPLICATION_DIFF_PER_NODE_TIMEOUT
Defines the maximum time to wait for a response when requesting object metadata from a remote node during the comparison phase, this prevents indefinite blocking if a node is unresponsive.
- By default is set to 10 seconds.
- Use Case(s): May need to be increased in environments with high network latency or potentially slow-responding nodes.
ASYNC_REPLICATION_PROPAGATION_TIMEOUT
Sets the maximum time allowed for a single propagation request (sending actual object data) to a remote node.
- By default is set to 30 seconds.
- Use Case(s): May need to be increased in scenarios with high network latency, large object sizes (e.g., images, vectors), or when sending large batches of objects.
- Considerations: Network latency, batch size, and the size of the objects being propagated can all affect timeouts.
Further Resources
Questions and feedback
If you have any questions or feedback, let us know in the user forum.