1. Welcome Developer Playground by Giri

Developer Playground

MongoDB Sharding Cluster with Docker Compose

Updated: March 31, 2025

What is MongoDB Sharding?

  • A method for distributing data across multiple machines
  • Supports deployments with very large datasets and high throughput
  • Enables horizontal scaling for growing applications
  • Allows for automated data balancing across shards
  • Provides high availability and fault tolerance
  • Essential for applications that exceed single server capacity
MongoDB Sharding ArchitectureClientmongos (Router)Config Server Replica SetShard 1 Replica SetShard 2 Replica SetShard 3 Replica SetApplication connects to mongosmongos tracks metadataData distributed across shards

Components of a Sharded Cluster

  • Shards: Store chunks of sharded data (each is a replica set)
  • Config Servers: Store metadata and configuration settings
  • Router (mongos): Routes queries and operations to appropriate shards
  • Minimum production requirements: 3 config servers, 2+ shards, 1+ router
  • Each shard and config server should be a replica set for redundancy

Setting Up with Docker Compose

Below is our complete docker-compose.yml file for creating a sharded MongoDB cluster:

services:
  configsvr1:
    container_name: configsvr1
    image: mongo:latest
    command: mongod --configsvr --replSet config_rs --dbpath /data/db --port 27017
    ports:
      - "10001:27017"
    volumes:
      - sharding_mongodb_configsvr1:/data/db

  configsvr2:
    container_name: configsvr2
    image: mongo:latest
    command: mongod --configsvr --replSet config_rs --dbpath /data/db --port 27017
    ports:
      - "10002:27017"
    volumes:
      - sharding_mongodb_configsvr2:/data/db

  configsvr3:
    container_name: configsvr3
    image: mongo:latest
    command: mongod --configsvr --replSet config_rs --dbpath /data/db --port 27017
    ports:
      - "10003:27017"
    volumes:
      - sharding_mongodb_configsvr3:/data/db

  mongos:
    depends_on:
      - configsvr1
      - configsvr2
      - configsvr3
      - shardsvr1_1
      - shardsvr1_2
      - shardsvr1_3
      - shardsvr2_1
      - shardsvr2_2
      - shardsvr2_3
    container_name: mongos
    image: mongo:latest
    command: mongos --configdb config_rs/configsvr1:27017,configsvr2:27017,configsvr3:27017 --port 27017 --bind_ip_all
    ports:
      - "30000:27017"

  shardsvr1_1:
    container_name: shardsvr1_1
    image: mongo:latest
    command: mongod --shardsvr --replSet shard1_rs --dbpath /data/db --port 27017
    ports:
      - "20001:27017"
    volumes:
      - sharding_mongodb_shardsvr1_1:/data/db

  shardsvr1_2:
    container_name: shardsvr1_2
    image: mongo:latest
    command: mongod --shardsvr --replSet shard1_rs --dbpath /data/db --port 27017
    ports:
      - "20002:27017"
    volumes:
      - sharding_mongodb_shardsvr1_2:/data/db

  shardsvr1_3:
    container_name: shardsvr1_3
    image: mongo:latest
    command: mongod --shardsvr --replSet shard1_rs --dbpath /data/db --port 27017
    ports:
      - "20003:27017"
    volumes:
      - sharding_mongodb_shardsvr1_3:/data/db

  shardsvr2_1:
    container_name: shardsvr2_1
    image: mongo:latest
    command: mongod --shardsvr --replSet shard2_rs --dbpath /data/db --port 27017
    ports:
      - "20004:27017"
    volumes:
      - sharding_mongodb_shardsvr2_1:/data/db

  shardsvr2_2:
    container_name: shardsvr2_2
    image: mongo:latest
    command: mongod --shardsvr --replSet shard2_rs --dbpath /data/db --port 27017
    ports:
      - "20005:27017"
    volumes:
      - sharding_mongodb_shardsvr2_2:/data/db

  shardsvr2_3:
    container_name: shardsvr2_3
    image: mongo:latest
    command: mongod --shardsvr --replSet shard2_rs --dbpath /data/db --port 27017
    ports:
      - "20006:27017"
    volumes:
      - sharding_mongodb_shardsvr2_3:/data/db

volumes:
  sharding_mongodb_configsvr1:
  sharding_mongodb_configsvr2:
  sharding_mongodb_configsvr3:
  sharding_mongodb_shardsvr1_1:
  sharding_mongodb_shardsvr1_2:
  sharding_mongodb_shardsvr1_3:
  sharding_mongodb_shardsvr2_1:
  sharding_mongodb_shardsvr2_2:
  sharding_mongodb_shardsvr2_3:
        
MongoDB Sharded Cluster with Docker ComposeConfig Server Replica Setconfigsvr1:27017configsvr2:27017configsvr3:27017mongos:27017Shard 1 Replica Setshardsvr1_1shardsvr1_2shardsvr1_3Shard 2 Replica Setshardsvr2_1shardsvr2_2shardsvr2_3

Step-by-Step Setup Guide

Let's walk through the process of setting up and configuring our MongoDB sharding cluster.

Step 1: Start the Containers

First, we need to launch all the containers defined in our docker-compose.yml file:

docker-compose up -d
        

This command will create and start all the containers in detached mode. Each container will run a MongoDB instance with specific roles (config server, shard server, or router).

Step 2: Configure the Config Server Replica Set

Connect to one of the config servers and initialize the replica set:

mongosh mongodb://localhost:10001
        

Once connected, initialize the config server replica set:

rs.initiate(
  {
    _id: "config_rs",
    configsvr: true,
    members: [
      { _id : 0, host : "configsvr1:27017" },
      { _id : 1, host : "configsvr2:27017" },
      { _id : 2, host : "configsvr3:27017" }
    ]
  }
)
        

You can check the status of the replica set to ensure it's properly configured:

rs.status()
        
Config Server Replica Set InitializationConfig Server Replica Setconfigsvr1:27017PRIMARYconfigsvr2:27017SECONDARYconfigsvr3:27017SECONDARYReplicationReplication

Step 3: Configure the Shard Replica Sets

Now we need to configure the replica sets for each shard. Connect to the first server in Shard 1:

mongosh mongodb://localhost:20001
        

Initialize the first shard's replica set:

rs.initiate(
  {
    _id: "shard1_rs",
    members: [
      { _id : 0, host : "shardsvr1_1:27017" },
      { _id : 1, host : "shardsvr1_2:27017" },
      { _id : 2, host : "shardsvr1_3:27017" }
    ]
  }
)
        

Connect to the first server in Shard 2:

mongosh mongodb://localhost:20004
        

Initialize the second shard's replica set:

rs.initiate(
  {
    _id: "shard2_rs",
    members: [
      { _id : 0, host : "shardsvr2_1:27017" },
      { _id : 1, host : "shardsvr2_2:27017" },
      { _id : 2, host : "shardsvr2_3:27017" }
    ]
  }
)
        

Step 4: Add Shards to the Cluster

With the replica sets initialized, we now need to connect to the mongos router and add the shards to the cluster:

mongosh mongodb://localhost:30000
        

Add both shards to the cluster:

sh.addShard("shard1_rs/shardsvr1_1:27017,shardsvr1_2:27017,shardsvr1_3:27017")
sh.addShard("shard2_rs/shardsvr2_1:27017,shardsvr2_2:27017,shardsvr2_3:27017")
        

You can verify that the shards were added correctly:

sh.status()
        
Adding Shards to the Clustermongos:27017Shard 1 Replica SetPSSShard 2 Replica SetPSSPrimary (P)Secondary (S)

Step 5: Enable Sharding for a Database

Now that we have our shards added to the cluster, we can enable sharding for a database and collection. Still connected to the mongos router:

// Create and switch to our database
use myShardedDB

// Enable sharding for the database
sh.enableSharding("myShardedDB")
  

Step 6: Shard a Collection

For data to be distributed across shards, we need to shard a collection using a shard key:

// Create a collection
db.createCollection("users")

// Create an index on the field we'll use as shard key
db.users.createIndex({ userId: "hashed" })

// Shard the collection using a hashed shard key
sh.shardCollection("myShardedDB.users", { userId: "hashed" })
  

By using a hashed shard key, MongoDB will evenly distribute the data across our shards.

Shard Key Distributionmongos:27017Document Insertion with userIdHashing Function (userId: "hashed")Shard 1Chunk 1: hash values range 1Chunk 3: hash values range 3Shard 2Chunk 2: hash values range 2Chunk 4: hash values range 4

Step 7: Test the Sharded Cluster

Now let's insert some test data to see our sharded cluster in action:

// Insert test documents
for(let i = 1; i <= 10000; i++) {
  db.users.insertOne({
    userId: i,
    name: "User " + i,
    email: "user" + i + "@example.com",
    created: new Date()
  });
}

// Check the chunk distribution
sh.status()

You should see that MongoDB has distributed the chunks across our two shards. As your data grows, MongoDB will continue to split and migrate chunks to maintain an even distribution.

Common Operations and Management

Here are some common operations you might need when managing your sharded cluster:

Checking Shard Status

// Get overall shard status
sh.status()

// Get database distribution
use config
db.databases.find()

// Get detailed chunk distribution
db.chunks.find().pretty()
  

Balancing Operations

The balancer runs automatically to distribute chunks evenly. You can manage it as follows:

// Check if balancer is running
sh.isBalancerRunning()

// Start the balancer
sh.startBalancer()

// Stop the balancer
sh.stopBalancer()

// Set balancer window (only run during off-peak hours)
use config
db.settings.updateOne(
  { _id: "balancer" },
  { $set: { activeWindow: { start: "01:00", stop: "05:00" } } },
  { upsert: true }
)
  

Adding a New Shard

As your data grows, you may need to add more shards. To add a new shard:

  1. Add the new shard servers to your docker-compose.yml file
  2. Start the new containers with docker-compose up -d
  3. Initialize the new shard's replica set
  4. Add the new shard to the cluster using sh.addShard()

Cleanup

When you're done experimenting with your sharded cluster, you can clean up:

# Stop all containers
docker-compose down

# If you want to remove volumes (this will delete all data)
docker volume ls --filter driver=local --format "{{.Name}}" | grep "sharding_mongodb" | xargs -r docker volume rm
        

Best Practices for MongoDB Sharding

  • Choose the right shard key for your workload pattern
  • Consider hashed shard keys for even distribution
  • Use compound shard keys for mixed query patterns
  • Monitor chunk distribution regularly
  • Implement proper backup strategies for each shard
  • Use dedicated servers for config servers in production
  • Place multiple mongos routers for high availability
  • Plan your cluster size based on data growth projections
  • Configure appropriate write concern based on your durability needs

Common Sharding Challenges

Jumbo Chunks

When chunks become too large for MongoDB to migrate, they're marked as "jumbo." These can occur with poorly chosen shard keys. To resolve, you may need to:

  • Manually split the jumbo chunks
  • Consider resharding with a better shard key

Hotspots

When certain shards receive disproportionately high traffic, they become "hotspots." This often happens with monotonically increasing shard keys (like timestamps).

  • Use hashed shard keys to distribute traffic evenly
  • Consider pre-splitting chunks for new collections
Common Sharding PatternsHashed Shard KeyEven DistributionChunk 1Chunk 2Chunk 3Chunk 4Chunk 5Chunk 6Monotonic Shard KeyUneven DistributionChunk 1Chunk 2Jumbo Chunk"Hotspot"Recent data &heavy trafficBalanced TrafficHeavy Traffic

Conclusion

MongoDB sharding provides a powerful solution for horizontally scaling your database. With Docker Compose, you can easily set up a local sharded cluster for development and testing. This helps you understand the concepts and prepare for a production deployment.

Remember that while this setup is excellent for learning and development, a production sharding setup would require additional considerations for security, monitoring, and backup strategies. Always thoroughly test your sharding strategy before implementing it in a production environment.

By following this guide, you've learned how to:

  • Set up a complete MongoDB sharding infrastructure with Docker Compose
  • Configure config server and shard replica sets
  • Enable sharding for databases and collections
  • Distribute data across shards using appropriate shard keys
  • Monitor and manage your sharded cluster
Copyright © 2025 Giri Labs Inc.·Trademark Policy