How to set up your own MongoDB sharded cluster for development in one host

In my current job I am using MongoDB as I need to deal with high-volumes of generated data and this NoSQL database is able to scale in a straight-forward and automatic way.

I will probably be covering different things related to this database engine in my next posts that I encountered while dealing while working with this.

In this post I will start with a basic thing: how to set up a MongoDB sharded cluster in your local machine.

If you don't know what a MongoDB sharded cluster is or what is it purpose I highly recommend reading the sharding section of the MongoDB documentation.

Components of a sharded cluster

Every sharded cluster has three main components:

  • Shards: This are the actual places where the data is stored. Each of the shards can be a mongod instance or a replica set.
  • Config Servers: The config server has the metadata about the cluster. It is in charge of keeping track of which shard has each piece of data.
  • Query Routers: The query routers are the point of interaction between the clients and the shard. The query servers use information from the config servers to retrieve the data from the shards.

For development purposes I am going to use three mongod instances as shards, exactly one mongod instance as config server and one mongos instance to be a query router.

For now I am not going to setup replica-sets for the shards, I am going to leave that for a future post.

It is important to remember that due to mongo restrictions the number of mongo config servers needs to be either one or three.
In a production environment you need to use three to guarantee redundancy but for a development environment with one will be enough.

Setting up the sharded cluster

Folder structure

We are going to store the whole cluster inside a folder, this way it is easier to manage the cluster when needed.

For that we create the following folder structure in some location, in my case I am going to use the / directory:

  • /mongocluster/
  • /mongocluster/mongod1/
  • /mongocluster/mongod1/logs/
  • /mongocluster/mongod1/data/
  • /mongocluster/mongod2/
  • /mongocluster/mongod2/logs/
  • /mongocluster/mongod2/data/
  • /mongocluster/mongod3/
  • /mongocluster/mongod3/logs/
  • /mongocluster/mongod3/data/
  • /mongocluster/mongoc/
  • /mongocluster/mongoc/logs/
  • /mongocluster/mongoc/data/
  • /mongocluster/mongos/
  • /mongocluster/mongos/logs/
  • /mongocluster/mongos/data/

The mongods folders will be used for the shards, mongoc for the config server and mongos for the query router.

Configuration Files

Once the folder structure has been created, we proceed to create the configuration files for each of the processes.

We are going to use YAML configuration files, the new version of MongoDB uses this type of configuration file.

If you intend to use a version of MongoDB before 2.6 you will need to go to MongoDB's documentation to see how to translate the config files to the old config file format.

The configuration files I am going to give are the most basic ones to have the cluster up and running.
If you need authentication or SSL you can add these to the configuration.

Shards configuration

For each shard we are going to use the following configuration template:

systemLog:
  destination: file
  path: "/mongocluster/mongodN/logs/mongodN.log"
  logAppend: true
processManagement:
  pidFilePath: "/mongocluster/mongodN/mongodN.pid"
  fork: true
net:
  bindIp: 127.0.0.1
  port: UNIQUE_PORT
storage:
  dbPath: "/mongocluster/mongodN/data"
  directoryPerDB: true
sharding:
  clusterRole: shardsvr
operationProfiling:
  mode: all

We are going to create a mongodN.conf inside each of the mongodN folders, replacing N for the corresponding number of shard.
Also it is important to set a different port to each of the shards, of course these ports have to be available in the host.

For example, for the /mongocluster/mongod1/mongod1.conf we can have this:

systemLog:
  destination: file
  path: "/mongocluster/mongod1/logs/mongod1.log"
  logAppend: true
processManagement:
  pidFilePath: "/mongocluster/mongod1/mongod1.pid"
  fork: true
net:
  bindIp: 127.0.0.1
  port: 47018
storage:
  dbPath: "/mongocluster/mongod1/data"
  directoryPerDB: true
sharding:
  clusterRole: shardsvr
operationProfiling:
  mode: all

The important things to notice here are:

  • That dbPath under the storage section is pointing to the correct place, otherwise you might have issues with the files mongod creates for normal operation if two of the shards point to the same data directory.
  • The sharding.clusterRole is the essential part of this configuration, it is the one that indicates that the mongod instance is part of a sharded cluster and that its role is to be a data shard.

Config server

The configuration file for the server is identical to the shards configuration except for the key difference that in the sharding.clusterRole we need to set up configsvr as the value.

Here is my configuration file for the server, the /mongocluster/mongoc/mongoc.conf file:

systemLog:
  destination: file
  path: "/mongocluster/mongoc/logs/mongoc.log"
  logAppend: true
processManagement:
  pidFilePath: "/mongocluster/mongoc/mongoc.pid"
  fork: true
net:
  bindIp: 127.0.0.1
  port: 47019
storage:
  dbPath: "/mongocluster/mongoc/data"
  directoryPerDB: true
sharding:
  clusterRole: configsvr
operationProfiling:
  mode: "all"

Query router (Mongos)

The configuration of the query router is pretty simple. The important part in it, is the sharding.configDB value.
The value needs to be a string containing the configuration server's location in the form of <host>:<port>.
If you have a 3-config server cluster you need to put the location of the three configuration servers separated by commas in the string.

Important: if you have more than one query router, make sure you use exactly the same string for the sharding.configDB in every query router.

This is the configuration file for the query router, which we'll locate at /mongocluster/mongos/mongos.conf:

systemLog:
  destination: file
  path: "/mongocluster/mongos/logs/mongos.log"
  logAppend: true
processManagement:
  pidFilePath: "/mongocluster/mongos/mongos.pid"
  fork: true
net:
  bindIp: 127.0.0.1
  port: 47017
sharding:
  configDB: "localhost:47019"

Running the sharded cluster

Once the folder structure and the files have been created, we are ready to start all of its components.

The order in which the components should be started is the following:

  1. Shards
  2. Config servers
  3. Query routers

Launching each of the elements is trivial.
For each of the shards and config servers we need to launch a mongod process with the corresponding configuration file.
Like this:

mongod --config <path_to_config>

For the query server case, we need to launch a mongos instance with the configuration for the query router:

mongos -f <path_to_config>

We can create a simple bash script that will launch all the required instances.
I call it start-mongo-cluster.sh and it has the following content:

#!/bin/bash

#Start the mongod shard instances
mongod --config /mongocluster/mongod1/mongod1.conf
mongod --config /mongocluster/mongod2/mongod2.conf
mongod --config /mongocluster/mongod3/mongod3.conf

#Start the mongod config server instance
mongod --config /mongocluster/mongoc/mongoc.conf

#Start the mongos
mongos -f /mongocluster/mongos/mongos.conf

To stop the components we just need to stop the started instances.

For that we are going to use the kill command.
In order to use it, we need the PIDs of each of the processes.
For that reason, we added the processManagement.pidFile to the configuration files of the components: the instances are going to store their PIDs in the those files, making it easy to get the PID of the process to kill when wanting to shutdown the cluster.

The following script shuts down each of the processes in case the PID file exists:

#!/bin/bash

#Stop mongos
PID_MONGOS_FILE=/mongocluster/mongos/mongos.pid
if [ -e $PID_MONGOS_FILE ]; then
    PID_MONGOS=$(cat $PID_MONGOS_FILE)
    kill $PID_MONGOS
    rm $PID_MONGOS_FILE
fi

#Stop mongo config
PID_MONGOC_FILE=/mongocluster/mongoc/mongoc.pid
if [ -e $PID_MONGOC_FILE ]; then
    PID_MONGOC=$(cat $PID_MONGOC_FILE)
    kill $PID_MONGOC
    rm $PID_MONGOC_FILE
fi

#Stop mongod shard instances
PID_MONGOD1_FILE=/mongocluster/mongod1/mongod1.pid
if [ -e $PID_MONGOD1_FILE ]; then
    PID_MONGOD1=$(cat $PID_MONGOD1_FILE)
    kill $PID_MONGOD1
    rm $PID_MONGOD1_FILE
fi

PID_MONGOD2_FILE=/mongocluster/mongod2/mongod2.pid
if [ -e $PID_MONGOD2_FILE ]; then
    PID_MONGOD2=$(cat $PID_MONGOD2_FILE)
    kill $PID_MONGOD2
    rm $PID_MONGOD2_FILE
fi

PID_MONGOD3_FILE=/mongocluster/mongod3/mongod3.pid
if [ -e $PID_MONGOD3_FILE ]; then
    PID_MONGOD3=$(cat $PID_MONGOD3_FILE)
    kill $PID_MONGOD3
    rm $PID_MONGOD3_FILE
fi

We have the sharded cluster almost ready to be used. The configuration server has no idea of the existing shards.

What we need to do is setup the shards we created in the configuration server.
In order to do that we need to connect to the cluster using the mongo client against the query server, like this:

$ mongo localhost:47017

Once we are connected we need to issue the following commands to add the shards to the cluster:

mongos> sh.addShard("localhost:47018")
mongos> sh.addShard("localhost:48018")
mongos> sh.addShard("localhost:49018")

And we're ready to go!