Developer Playground
Below is our complete docker-compose.yml file for creating a sharded MongoDB cluster:
services: configsvr1: container_name: configsvr1 image: mongo:latest command: mongod --configsvr --replSet config_rs --dbpath /data/db --port 27017 ports: - "10001:27017" volumes: - sharding_mongodb_configsvr1:/data/db configsvr2: container_name: configsvr2 image: mongo:latest command: mongod --configsvr --replSet config_rs --dbpath /data/db --port 27017 ports: - "10002:27017" volumes: - sharding_mongodb_configsvr2:/data/db configsvr3: container_name: configsvr3 image: mongo:latest command: mongod --configsvr --replSet config_rs --dbpath /data/db --port 27017 ports: - "10003:27017" volumes: - sharding_mongodb_configsvr3:/data/db mongos: depends_on: - configsvr1 - configsvr2 - configsvr3 - shardsvr1_1 - shardsvr1_2 - shardsvr1_3 - shardsvr2_1 - shardsvr2_2 - shardsvr2_3 container_name: mongos image: mongo:latest command: mongos --configdb config_rs/configsvr1:27017,configsvr2:27017,configsvr3:27017 --port 27017 --bind_ip_all ports: - "30000:27017" shardsvr1_1: container_name: shardsvr1_1 image: mongo:latest command: mongod --shardsvr --replSet shard1_rs --dbpath /data/db --port 27017 ports: - "20001:27017" volumes: - sharding_mongodb_shardsvr1_1:/data/db shardsvr1_2: container_name: shardsvr1_2 image: mongo:latest command: mongod --shardsvr --replSet shard1_rs --dbpath /data/db --port 27017 ports: - "20002:27017" volumes: - sharding_mongodb_shardsvr1_2:/data/db shardsvr1_3: container_name: shardsvr1_3 image: mongo:latest command: mongod --shardsvr --replSet shard1_rs --dbpath /data/db --port 27017 ports: - "20003:27017" volumes: - sharding_mongodb_shardsvr1_3:/data/db shardsvr2_1: container_name: shardsvr2_1 image: mongo:latest command: mongod --shardsvr --replSet shard2_rs --dbpath /data/db --port 27017 ports: - "20004:27017" volumes: - sharding_mongodb_shardsvr2_1:/data/db shardsvr2_2: container_name: shardsvr2_2 image: mongo:latest command: mongod --shardsvr --replSet shard2_rs --dbpath /data/db --port 27017 ports: - "20005:27017" volumes: - sharding_mongodb_shardsvr2_2:/data/db shardsvr2_3: container_name: shardsvr2_3 image: mongo:latest command: mongod --shardsvr --replSet shard2_rs --dbpath /data/db --port 27017 ports: - "20006:27017" volumes: - sharding_mongodb_shardsvr2_3:/data/db volumes: sharding_mongodb_configsvr1: sharding_mongodb_configsvr2: sharding_mongodb_configsvr3: sharding_mongodb_shardsvr1_1: sharding_mongodb_shardsvr1_2: sharding_mongodb_shardsvr1_3: sharding_mongodb_shardsvr2_1: sharding_mongodb_shardsvr2_2: sharding_mongodb_shardsvr2_3:
Let's walk through the process of setting up and configuring our MongoDB sharding cluster.
First, we need to launch all the containers defined in our docker-compose.yml file:
docker-compose up -d
This command will create and start all the containers in detached mode. Each container will run a MongoDB instance with specific roles (config server, shard server, or router).
Connect to one of the config servers and initialize the replica set:
mongosh mongodb://localhost:10001
Once connected, initialize the config server replica set:
rs.initiate( { _id: "config_rs", configsvr: true, members: [ { _id : 0, host : "configsvr1:27017" }, { _id : 1, host : "configsvr2:27017" }, { _id : 2, host : "configsvr3:27017" } ] } )
You can check the status of the replica set to ensure it's properly configured:
rs.status()
Now we need to configure the replica sets for each shard. Connect to the first server in Shard 1:
mongosh mongodb://localhost:20001
Initialize the first shard's replica set:
rs.initiate( { _id: "shard1_rs", members: [ { _id : 0, host : "shardsvr1_1:27017" }, { _id : 1, host : "shardsvr1_2:27017" }, { _id : 2, host : "shardsvr1_3:27017" } ] } )
Connect to the first server in Shard 2:
mongosh mongodb://localhost:20004
Initialize the second shard's replica set:
rs.initiate( { _id: "shard2_rs", members: [ { _id : 0, host : "shardsvr2_1:27017" }, { _id : 1, host : "shardsvr2_2:27017" }, { _id : 2, host : "shardsvr2_3:27017" } ] } )
With the replica sets initialized, we now need to connect to the mongos router and add the shards to the cluster:
mongosh mongodb://localhost:30000
Add both shards to the cluster:
sh.addShard("shard1_rs/shardsvr1_1:27017,shardsvr1_2:27017,shardsvr1_3:27017") sh.addShard("shard2_rs/shardsvr2_1:27017,shardsvr2_2:27017,shardsvr2_3:27017")
You can verify that the shards were added correctly:
sh.status()
Now that we have our shards added to the cluster, we can enable sharding for a database and collection. Still connected to the mongos router:
// Create and switch to our database use myShardedDB // Enable sharding for the database sh.enableSharding("myShardedDB")
For data to be distributed across shards, we need to shard a collection using a shard key:
// Create a collection db.createCollection("users") // Create an index on the field we'll use as shard key db.users.createIndex({ userId: "hashed" }) // Shard the collection using a hashed shard key sh.shardCollection("myShardedDB.users", { userId: "hashed" })
By using a hashed shard key, MongoDB will evenly distribute the data across our shards.
Now let's insert some test data to see our sharded cluster in action:
// Insert test documents for(let i = 1; i <= 10000; i++) { db.users.insertOne({ userId: i, name: "User " + i, email: "user" + i + "@example.com", created: new Date() }); } // Check the chunk distribution sh.status()
You should see that MongoDB has distributed the chunks across our two shards. As your data grows, MongoDB will continue to split and migrate chunks to maintain an even distribution.
Here are some common operations you might need when managing your sharded cluster:
// Get overall shard status sh.status() // Get database distribution use config db.databases.find() // Get detailed chunk distribution db.chunks.find().pretty()
The balancer runs automatically to distribute chunks evenly. You can manage it as follows:
// Check if balancer is running sh.isBalancerRunning() // Start the balancer sh.startBalancer() // Stop the balancer sh.stopBalancer() // Set balancer window (only run during off-peak hours) use config db.settings.updateOne( { _id: "balancer" }, { $set: { activeWindow: { start: "01:00", stop: "05:00" } } }, { upsert: true } )
As your data grows, you may need to add more shards. To add a new shard:
docker-compose up -d
sh.addShard()
When you're done experimenting with your sharded cluster, you can clean up:
# Stop all containers docker-compose down # If you want to remove volumes (this will delete all data) docker volume ls --filter driver=local --format "{{.Name}}" | grep "sharding_mongodb" | xargs -r docker volume rm
When chunks become too large for MongoDB to migrate, they're marked as "jumbo." These can occur with poorly chosen shard keys. To resolve, you may need to:
When certain shards receive disproportionately high traffic, they become "hotspots." This often happens with monotonically increasing shard keys (like timestamps).
MongoDB sharding provides a powerful solution for horizontally scaling your database. With Docker Compose, you can easily set up a local sharded cluster for development and testing. This helps you understand the concepts and prepare for a production deployment.
Remember that while this setup is excellent for learning and development, a production sharding setup would require additional considerations for security, monitoring, and backup strategies. Always thoroughly test your sharding strategy before implementing it in a production environment.
By following this guide, you've learned how to: