Rakesh Tholiya
Written By Rakesh Tholiya NodeJS Team Lead

MongoDB Sharding: A Step-by-Step Guide

post date
December 27, 2023
MongoDB Sharding A Step-by-Step Guide
Reading Time: 6 minutes

What is Database Sharding?

Sharding is a method for distributing data across multiple servers. MongoDB uses sharding to support deployments with very large databases and high throughput query operations.

Database systems with large data sets or high throughput applications can challenge the capacity of a single server. For example, high query rates can exhaust the CPU capacity of the server. Working set sizes larger than the system’s RAM stress the I/O capacity of disk drives.

There are two methods for addressing system growth: vertical and horizontal scaling.

Vertical Scaling involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space.

Horizontal Scaling involves dividing the database and loading over multiple servers, adding additional servers to increase capacity as required. While the overall speed or capacity of a single machine may not be high, each machine handles a subset of the overall workload, potentially providing better efficiency than a single high-speed high-capacity server.

MongoDB supports horizontal scaling through sharding.

What is a Sharded Cluster?

A MongoDB sharded cluster consists of the following components:

  1. config servers: Config servers store metadata and configuration settings for the cluster like user details, shard details, and configuration.
  2. shard: Each shard contains a subset of the sharded data. Each shard can be deployed as a replica set and each shard should be in a different server.
  3. Mongos: The Mongos acts as a query router, providing an interface between client applications and the sharded cluster. Mongos uses stored metadata in the config server and fetches data from shared based on shard configuration.

Sharded Cluster

Sharded Cluster in Local or On-premise

We will create a sharded cluster with the below details.

  1. One Config Server with 3 replica sets.
  2. Two Shards with 3 replica sets in each.
  3. One Router (mongos)

According to the above details, we need 3 servers (1 for each replica) for Config Server, 3 servers (1 for each replica) for Shard1 same as 3 servers for Shard2, and one server for Router (mongos).

So we need a total of 3 + 3 + 3 + 1 = 10 servers. It’s not possible locally to manage this many systems so we will use 10 docker containers for sharded clusters.

Note: while setting up production sharded clusters it’s not compulsory to have 10 servers you can set this in 3 servers also, based on your production uses and the scalability you want.

Prerequisite for MongoDB

  • One system with min 4GB RAM and min 2 CPU
  • Proper Internet connectivity.
  • Linux OS
  • Docker and docker-compose installed
  • Basic docker knowledge

Setup and Installation

In this sharded cluster we are going to use Mongodb v7.0.4 version, for that we will use the mongo:7.0.4 docker image.

Config servers

We will deploy the config server in sharded clusters in replica sets. Using a replica set for the config servers improves consistency across the config servers since MongoDB can take advantage of the standard replica set read and write protocols for the config data.

To deploy the config server we will use a docker container, I have created a docker-compose file for the config server, and in this docker-compose file, I have added 3 containers for replica sets.

In this docker-compose file, I have shared volume for all 3 containers of replica sets of config server and I have used 10001, 10002, and 10003 ports for each container, so make sure this port is open.

docker-compose.yml for config server.

version: '3'

services:

  configsvr1:
    container_name: configsvr1
    image: mongo:7.0.4
    command: mongod --configsvr --replSet config_rs --dbpath /data/db --port 27017
    ports:
      - 10001:27017
    volumes:
      - ./configsvr1:/data/db

  configsvr2:
    container_name: configsvr2
    image: mongo:7.0.4
    command: mongod --configsvr --replSet config_rs --dbpath /data/db --port 27017
    ports:
      - 10002:27017
    volumes:
      - ./configsvr2:/data/db

  configsvr3:
    container_name: configsvr3
    image: mongo:7.0.4
    command: mongod --configsvr --replSet config_rs --dbpath /data/db --port 27017
    ports:
      - 10003:27017
    volumes:
      - ./configsvr3:/data/db

Run this file using the “docker-compose up -d” command, it will create 3 containers, once it is up and running, you need to connect any of one container’s Mongodb.

We will connect with the first one 10001, for that you can use mongosh mongodb://<server-ip>:10001 this command, or you can use any cli tools, once you connect you have to run the below query.

Before running the below query replace your server (local machine) ip with <server-ip>.

rs.initiate(
  {
    _id: "config_rs",
    configsvr: true,
    members: [
      { _id : 0, host : "<server-ip>:10001" },
      { _id : 1, host : "<server-ip>:10002" },
      { _id : 2, host : "<server-ip>:10003" }
    ]
  }
)

This query will create 3 replica sets of config servers.

Shard

In this section, we will create 2 shards each shard has 3 replica sets, the same as the config server I have created a docker compose file for both shards.

For shard1 we will use 20001, 20002, and 20003 ports and for shard2 we will use 20004, 20005, and 20006 ports, we will share volume to keep data stored.

docker-compose.yml file for shard1

version: '3'

services:

  shardsvr1_1:
    container_name: shardsvr1_1
    image: mongo:7.0.4
    command: mongod --shardsvr --replSet shard1_rs --dbpath /data/db --port 27017
    ports:
      - 20001:27017
    volumes:
      - ./shardsvr1_1:/data/db

  shardsvr1_2:
    container_name: shardsvr1_2
    image: mongo:7.0.4
    command: mongod --shardsvr --replSet shard1_rs --dbpath /data/db --port 27017
    ports:
      - 20002:27017
    volumes:
      - ./shardsvr1_2:/data/db

  shardsvr1_3:
    container_name: shardsvr1_3
    image: mongo:7.0.4
    command: mongod --shardsvr --replSet shard1_rs --dbpath /data/db --port 27017
    ports:
      - 20003:27017
    volumes:
      - ./shardsvr1_3:/data/db

Run this file using the “docker-compose up -d” command, it will create 3 containers, once it is up and running, you need to connect any of one container’s Mongodb.

We will connect with the first one 20001, for that you can use mongosh mongodb://<server-ip>:20001 this command, or you can use any cli tools, once you connect you have to run the below query.

Before running the below query replace your server (local machine) ip with <server-ip>.

rs.initiate(
  {
    _id: "shard1_rs",
    members: [
      { _id : 0, host : "<server-ip>:20001" },
      { _id : 1, host : "<server-ip>:20002" },
      { _id : 2, host : "<server-ip>:20003" }
    ]
  }
)

This query will create 3 replica sets of shard1.

docker-compose.yml file for shard2

version: '3'

services:

  shardsvr2_1:
    container_name: shardsvr2_1
    image: mongo:7.0.4
    command: mongod --shardsvr --replSet shard2_rs --dbpath /data/db --port 27017
    ports:
      - 20004:27017
    volumes:
      - ./shardsvr2_1:/data/db

  shardsvr2_2:
    container_name: shardsvr2_2
    image: mongo:7.0.4
    command: mongod --shardsvr --replSet shard2_rs --dbpath /data/db --port 27017
    ports:
      - 20005:27017
    volumes:
      - ./shardsvr2_2:/data/db

  shardsvr2_3:
    container_name: shardsvr2_3
    image: mongo:7.0.4
    command: mongod --shardsvr --replSet shard2_rs --dbpath /data/db --port 27017
    ports:
      - 20006:27017
    volumes:
      - ./shardsvr2_3:/data/db

Run this file using the “docker-compose up -d” command, it will create 3 containers, once it is up and running, you need to connect any of one container’s Mongodb.

We will connect with the first one 20004, for that you can use mongosh mongodb://<server-ip>:20004 this command, or you can use any cli tools, once you connect you have to run the below query.

Before running the below query replace your server (local machine) ip with <server-ip>.

rs.initiate(
  {
    _id: "shard2_rs",
    members: [
      { _id : 0, host : "<server-ip>:20004" },
      { _id : 1, host : "<server-ip>:20005" },
      { _id : 2, host : "<server-ip>:20006" }
    ]
  }
)

This query will create 3 replica sets of shard2.

Mongos (router)

MongoDB mongos instances route queries and write operations to shards in a sharded cluster. mongos provide the only interface to a sharded cluster from the perspective of applications. Applications never connect or communicate directly with the shards.

A mongos instance routes a query to a cluster by:

  • Determining the list of shards that must receive the query.
  • Establishing a cursor on all targeted shards.

For Mongos also I have created a docker-compose file, which will create one container and connect to config servers’s replica sets.

For Mongos we will use a 30000 port and there is no need to share volume. In this docker-compose file you need to replace <server-ip> with your server (local machine) ip.

docker-compose.yml file for mongos

version: '3'

services:

  mongos:
    container_name: mongos-route
    image: mongo:7.0.4
    command: mongos --configdb config_rs/<server-ip>:10001,<server-ip>:10002,<server-ip>:10003 --port 27017 --bind_ip_all
    ports:
      - 30000:27017

Now, that our basic setup and installation are done, we need to register shards in the router, in this process the router will store shards data in config servers.

To register shard you need to connect Mongos (router) using mongosh mongodb://<server-ip>:30000 command and need to run the query, but before run query replace <server-ip> with your server (local machine) ip.

sh.addShard("shard1_rs/<server-ip>:20001,<server-ip>:20002,<server-ip>:20003");
sh.addShard("shard2_rs/<server-ip>:20004,<server-ip>:20005,<server-ip>:20006");

This 2 query registers both shards in the router.

Here all our configuration is done, we need to enable sharing in particular databases and need to set shared keys in the collection.

Now connect with the router and run the below query to enable sharding, we will use a test database, in this database we will enable sharding.

Below query will switch to the test database.

use test

Below query will enable sharding in the test database.

sh.enableSharding("test")

We will use user collection and email field of user collection to distribute data across two shards, for that we need to run the below query, and make sure before running this query collection should be empty or not created.

sh.shardCollection("test.users", { email : "hashed" } )

For the sharding key, selection refers to MongoDB’s official document for shardCollection.

Now insert some data in the user’s collection by connecting mongos (router), I will suggest inserting at least 20 data with different emails.

After inserting data run the below query it will show you data distribution across two shards.

db.users.getShardDistribution()

Useful Links

Conclusion

In conclusion, mastering Mongo Sharding is your key to scalability and performance in MongoDB. With this step-by-step guide as your compass, navigate the intricacies, conquer the challenges, and unlock the full potential of distributed database management. Take charge, share smart, and elevate your MongoDB experience to new heights. Happy Sharding!

Rakesh Tholiya
Written By Rakesh Tholiya NodeJS Team Lead

Popular Categories

Get In Touch

Contact
GET IN TOUCH

Consult With Our Experts Today!

Your Benefits!

  • Professional Project Consultation
  • Detailed Project Proposal
Skype Skype Whatsapp WhatsApp