[MongoDB] Cluster-to-Cluster Sync MongoDB data synchronization process
BLOG
7/1/20242 min lire
In this article, I will introduce you to some methods of cluster-to-cluster synchronization in MongoDB.
Natively, MongoDB provides and automates intra-cluster movement of data using replica sets and sharded clusters. But there are occasions when you want to go beyond a single MongoDB cluster and synchronize data to a separate cluster (inter-cluster) :
Migrating to MongoDB Atlas
Creating separate development and production environments
Supporting DevOps strategies
Deploying dedicated analytics environments
Meeting locality requirements for auditing and compliance
Maintaining preparedness for a stressed exit
Moving data to the edge
Cluster-to-Cluster Sync provides continuous data synchronization (uni-directional) or a one-time data migration between two MongoDB clusters.
The mongosync binary is the primary process used in Cluster-to-Cluster Sync. Mongosync migrates data from one cluster to another and can keep the clusters in continuous sync.
Available from version 6.0.13+
Mongosync does not synchronize users or roles (you can create users with different access permissions on each cluster)
You can start the synchronization, pause, stop, resume or commit – you can even reverse the direction of synchronization
The destination cluster does not accept writes operation during the sync process
Two different clusters can be continuously kept in sync, and this synchronization can be stopped and triggered again whenever needed.
Mongosync can be downloaded as a tool
The mongosync utility can be hosted on its own hardware (for minimal impact on the mongod instances)
Case example :
If this process cannot be applied in your case, then it is possible to think of other strategies. The best way to move data from cluster to cluster depends also on the instance size, the latency between cluster servers, the workload, the allowed downtime etc..
You can use mongodump and mongorestore
Advantage
You can synchronize 1 or more database/collection
You can synchronize partial data (not the entire replica set)
Drawback
For large data, things like latency and transfer cost between clusters must be in your mind
There is no continuous synchronization, this is only a one-time sync that you can repeat as many as you want
You have to rebuild your replica set after you restore
You can use mongomirror, but only work when you want to move data from an existing MongoDB replicaset to Atlas.
You can use Kafka and MongoDB Kafka Connector, that could Add additional cost and a technical expertise is potentially needed.