In my current job I am using MongoDB as I need to deal with high-volumes of generated data and this NoSQL database is able to scale in a straight-forward and automatic way.
I will probably be covering different things related to this database engine in my next posts that I encountered while dealing while working with this.
In this post I will start with a basic thing: how to set up a MongoDB sharded cluster in your local machine.
If you don't know what a MongoDB sharded cluster is or what is it purpose I highly recommend reading the sharding section of the MongoDB documentation.
Components of a sharded cluster
Every sharded cluster has three main components:
- Shards: This are the actual places where the data is stored. Each of the shards can be a mongod instance or a replica set.
- Config Servers: The config server has the metadata about the cluster. It is in charge of keeping track of which shard has each piece of data.
- Query Routers: The query routers are the point of interaction between the clients and the shard. The query servers use information from the config servers to retrieve the data from the shards.
For development purposes I am going to use three mongod instances as shards, exactly one mongod instance as config server and one mongos instance to be a query router.
For now I am not going to setup replica-sets for the shards, I am going to leave that for a future post.
It is important to remember that due to mongo restrictions the number of mongo config servers needs to be either one or three.
In a production environment you need to use three to guarantee redundancy but for a development environment with one will be enough.
Setting up the sharded cluster
We are going to store the whole cluster inside a folder, this way it is easier to manage the cluster when needed.
For that we create the following folder structure in some location, in my case I am going to use the
The mongods folders will be used for the shards, mongoc for the config server and mongos for the query router.
Once the folder structure has been created, we proceed to create the configuration files for each of the processes.
We are going to use YAML configuration files, the new version of MongoDB uses this type of configuration file.
If you intend to use a version of MongoDB before 2.6 you will need to go to MongoDB's documentation to see how to translate the config files to the old config file format.
The configuration files I am going to give are the most basic ones to have the cluster up and running.
If you need authentication or SSL you can add these to the configuration.
For each shard we are going to use the following configuration template:
systemLog: destination: file path: "/mongocluster/mongodN/logs/mongodN.log" logAppend: true processManagement: pidFilePath: "/mongocluster/mongodN/mongodN.pid" fork: true net: bindIp: 127.0.0.1 port: UNIQUE_PORT storage: dbPath: "/mongocluster/mongodN/data" directoryPerDB: true sharding: clusterRole: shardsvr operationProfiling: mode: all
We are going to create a mongodN.conf inside each of the mongodN folders, replacing N for the corresponding number of shard.
Also it is important to set a different port to each of the shards, of course these ports have to be available in the host.
For example, for the
/mongocluster/mongod1/mongod1.conf we can have this:
systemLog: destination: file path: "/mongocluster/mongod1/logs/mongod1.log" logAppend: true processManagement: pidFilePath: "/mongocluster/mongod1/mongod1.pid" fork: true net: bindIp: 127.0.0.1 port: 47018 storage: dbPath: "/mongocluster/mongod1/data" directoryPerDB: true sharding: clusterRole: shardsvr operationProfiling: mode: all
The important things to notice here are:
- That dbPath under the storage section is pointing to the correct place, otherwise you might have issues with the files mongod creates for normal operation if two of the shards point to the same data directory.
- The sharding.clusterRole is the essential part of this configuration, it is the one that indicates that the mongod instance is part of a sharded cluster and that its role is to be a data shard.
The configuration file for the server is identical to the shards configuration except for the key difference that in the
sharding.clusterRole we need to set up configsvr as the value.
Here is my configuration file for the server, the
systemLog: destination: file path: "/mongocluster/mongoc/logs/mongoc.log" logAppend: true processManagement: pidFilePath: "/mongocluster/mongoc/mongoc.pid" fork: true net: bindIp: 127.0.0.1 port: 47019 storage: dbPath: "/mongocluster/mongoc/data" directoryPerDB: true sharding: clusterRole: configsvr operationProfiling: mode: "all"
Query router (Mongos)
The configuration of the query router is pretty simple. The important part in it, is the
The value needs to be a string containing the configuration server's location in the form of
If you have a 3-config server cluster you need to put the location of the three configuration servers separated by commas in the string.
Important: if you have more than one query router, make sure you use exactly the same string for the
sharding.configDB in every query router.
This is the configuration file for the query router, which we'll locate at
systemLog: destination: file path: "/mongocluster/mongos/logs/mongos.log" logAppend: true processManagement: pidFilePath: "/mongocluster/mongos/mongos.pid" fork: true net: bindIp: 127.0.0.1 port: 47017 sharding: configDB: "localhost:47019"
Running the sharded cluster
Once the folder structure and the files have been created, we are ready to start all of its components.
The order in which the components should be started is the following:
- Config servers
- Query routers
Launching each of the elements is trivial.
For each of the shards and config servers we need to launch a mongod process with the corresponding configuration file.
mongod --config <path_to_config>
For the query server case, we need to launch a mongos instance with the configuration for the query router:
mongos -f <path_to_config>
We can create a simple bash script that will launch all the required instances.
I call it start-mongo-cluster.sh and it has the following content:
#!/bin/bash #Start the mongod shard instances mongod --config /mongocluster/mongod1/mongod1.conf mongod --config /mongocluster/mongod2/mongod2.conf mongod --config /mongocluster/mongod3/mongod3.conf #Start the mongod config server instance mongod --config /mongocluster/mongoc/mongoc.conf #Start the mongos mongos -f /mongocluster/mongos/mongos.conf
To stop the components we just need to stop the started instances.
For that we are going to use the
In order to use it, we need the PIDs of each of the processes.
For that reason, we added the
processManagement.pidFile to the configuration files of the components: the instances are going to store their PIDs in the those files, making it easy to get the PID of the process to kill when wanting to shutdown the cluster.
The following script shuts down each of the processes in case the PID file exists:
#!/bin/bash #Stop mongos PID_MONGOS_FILE=/mongocluster/mongos/mongos.pid if [ -e $PID_MONGOS_FILE ]; then PID_MONGOS=$(cat $PID_MONGOS_FILE) kill $PID_MONGOS rm $PID_MONGOS_FILE fi #Stop mongo config PID_MONGOC_FILE=/mongocluster/mongoc/mongoc.pid if [ -e $PID_MONGOC_FILE ]; then PID_MONGOC=$(cat $PID_MONGOC_FILE) kill $PID_MONGOC rm $PID_MONGOC_FILE fi #Stop mongod shard instances PID_MONGOD1_FILE=/mongocluster/mongod1/mongod1.pid if [ -e $PID_MONGOD1_FILE ]; then PID_MONGOD1=$(cat $PID_MONGOD1_FILE) kill $PID_MONGOD1 rm $PID_MONGOD1_FILE fi PID_MONGOD2_FILE=/mongocluster/mongod2/mongod2.pid if [ -e $PID_MONGOD2_FILE ]; then PID_MONGOD2=$(cat $PID_MONGOD2_FILE) kill $PID_MONGOD2 rm $PID_MONGOD2_FILE fi PID_MONGOD3_FILE=/mongocluster/mongod3/mongod3.pid if [ -e $PID_MONGOD3_FILE ]; then PID_MONGOD3=$(cat $PID_MONGOD3_FILE) kill $PID_MONGOD3 rm $PID_MONGOD3_FILE fi
We have the sharded cluster almost ready to be used. The configuration server has no idea of the existing shards.
What we need to do is setup the shards we created in the configuration server.
In order to do that we need to connect to the cluster using the mongo client against the query server, like this:
$ mongo localhost:47017
Once we are connected we need to issue the following commands to add the shards to the cluster:
mongos> sh.addShard("localhost:47018") mongos> sh.addShard("localhost:48018") mongos> sh.addShard("localhost:49018")
And we're ready to go!