May 2012 A 10gen White Paper MongoDB on Red Hat Enterprise Linux By: Sandeep Parikh, Technical Product Manager, 10gen & Sanjay Rao, Principal Performance Engineer, Red Hat
  • Table of Contents

    IntRoDuctIon 1

    ABout RED HAt EntERpRIsE LInux 1

    ABout MongoDB 2

    MongoDB Components 2

    Deployment Architectures 2

    MongoDB on RED HAt EntERpRIsE LInux 4

    Installing MongoDB 4

    Storage Configuration 4

    Securing MongoDB 6

    Database Backups 7

    System Monitoring 8

    suMMARy 0 9

  • 1Red Hat Enterprise Linux (RHEL) powers mission critical applications for thousands of enterprises and delivers performance, reliability, scalability, and security. Certified by the leading hardware and software vendors, it is suitable for, and has been deployed on, desktops, servers, and mainframes. It has also completed the most stringent government security certifications, safeguarding systems and data. RHEL features virtualization, mandatory access control (MAC) security, high-availability clustering, modularity, and extensive energy management capabilities.

    MongoDB is a scalable, high-performance, open source data store. MongoDB provides JSONstyle document-oriented storage with full index support, auto-sharding, sophisticated replication, and compat-ibility with the Map-Reduce paradigm. MongoDB focuses on flexibility, power, speed, and ease of use.

    This whitepaper will help you understand the steps necessary to deploy MongoDB on Red Hat Enterprise Linux 6.2 and take advantage of several features of the underlying system. We will provide an overview of MongoDB and discuss several example deployment scenarios to support high performance database reads and writes. We will also review the specific

    steps necessary for deploying MongoDB on RHEL 6.2, including tuning system parameters for performance, reliability and security. The supplied architectural patterns and system configuration steps can serve as a basis for deploying multi-node database clusters in a production environment.

    About Red Hat Enterprise Linux

    Red Hat Enterprise Linux is built to support mission critical applications and focuses on delivering security, reliability and performance for systems on which it is deployed.

    The features available in RHEL allow it to meet evolving data center needs. Advances in system efficiency lead to higher performance as well as reduced operating costs. Increased scalability meets the need for larger systems with more CPU sockets and cores, more memory, and larger filesystems. System evolution that makes the most of emerging


  • 2hardware features such as Reliability, Availability, Serviceability (RAS) capabilities keeps your systems running in the face of hardware failures that would have caused previous generations of system to crash.

    The primary responsibility of an operating system is to manage processing, memory, storage, and network resources, RHEL builds on this responsibility with advanced resource management features. Partitioning processing resources to applications with Control Groups and associated controllers can help those applications meet SLA, reduce resource bottlenecks, and increase net system performance. Management of storage and networks includes availability and perfor-mance improvements.

    RHEL also provides a complete portfolio of security technologies with solutions for all facets of your system, data, and communications security challenges. A secure system depends on controlling access to services and data, and features from Red Hats SELinux have been applied to more system services, and can be used to run untrusted applications without risk to your system. Overall system security is improved by standardized deployment documentation that verifies patch installation and detects possible system compromises. Many improvements to system identity services include a new service that can aggregate, normalize, and cache information from diverse identity servers.

    About MongoDB

    MongoDB is a scalable, high-performance, open source NoSQL database. MongoDB is built around storing JSON-style documents, allowing data to be schema free and more dynamically represented. It offers full index support on any attribute which makes queries fast and flexible and supports atomic in-place document updates for high performance consistent updates. MongoDB is built from the ground up to scale for high performance database access. High avail-ability and read scaling are accomplished via Replica Sets, an asynchronous data replication mechanism where data is written to master nodes and replicated to any number of secondary nodes. Applications can be configured to read from the primary or any of the secondaries, providing support for high performance reads. MongoDB also scales horizontally

    by automatically partitioning your data across multiple shards without a single point of failure. Routing requests to the appropriate shard is handled by MongoDB internal mechanisms so that applications can take advantage of performance increases without being updated and redeployed.

    MongoDB coMponEnts


    Mongod is the primary component of MongoDB, responsible for the structured storage system. In single node or replica set deployments the only component running is mongod. When deploying sharded configu-rations, mongos and config will also be necessary.


    Mongos is a routing and coordination process that makes the mongod nodes in the cluster look like a single system. Mongos processes route data requests, keeping a cached copy of config server information in memory. Any changes that occur on the config servers are propagated to each mongos process. Mongos processes may be run on the shard servers themselves, but they are lightweight enough to exist on each application server. Many mongos processes can be run simultaneously since these processes do not coordinate between one another.


    Config server is a mongod process used to synchro-nously replicate the state information of a sharded environment. In a sharded environment, config servers store the metadata of the cluster. Although the config server can run as a standalone, production deploy-ments should run three individual config server instances with copies of the same metadata (for data safety).

    DEpLoyMEnt ARcHItEctuREs

    Using upon these components, the following diagrams illustrate the architecture of various deployment types. Beginning with a single node, the following steps and diagrams can be used to guide you through a production deployment as the scale of the database load grows. The following should be used as reference designs, or starting points, for your own deployment.

  • 3Mongod primary and secondary nodes are configured in a similar fashion, which allows replica set deploy-ments to be completed quickly. Sharded configurations require the additional configuration of mongos routers and mongod metadata config instances. For more information about deployment configurations, refer to the MongoDB Replica Set1 or MongoDB Sharding2 documentation.

    single MongoDB node

    The basic building block for all other configurations

    Not intended for production use as it as introduces single point of failure

    Multi-node replica set

    Requires an odd number of nodes (for primary election purposes)

    Replica sets are used for data redundancy

    Apps can also be updated to read from secondary nodes, supporting read scaling

    spreading a replica set across datacenters

    Ensures high availability access to data even if an entire data center is lost

    sharding using replica sets as the base for each data shard

    Config servers manage metadata for sharded configurations (shard keys, data balancing)

    Mongos routes requests from applications to the correct data shard

    1 2

    Replica Set







    Replica Set







    Data Center A Data Center B Data Center C

    Mongos Mongos Mongos

    Application Application Application










    Shard 1 Shard 2 Shard 3




    Replica Set







    Replica Set







    Data Center A Data Center B Data Center C

    Mongos Mongos Mongos

    Application Application Application










    Shard 1 Shard 2 Shard 3




    Replica Set







    Replica Set







    Data Center A Data Center B Data Center C

    Mongos Mongos Mongos

    Application Application Application










    Shard 1 Shard 2 Shard 3




    Replica Set







    Replica Set







    Data Center A Data Center B Data Center C

    Mongos Mongos Mongos

    Application Application Application










    Shard 1 Shard 2 Shard 3




  • 4MongoDB on Red Hat Enterprise Linux

    InstALLIng MongoDB

    To get started with MongoDB, first complete the installation and setup for a base Red Hat Enterprise Linux server system. Depending on the type of instal-lation completed, you may need to add an additional group and administrative-level user (if they werent created during installation). You will also need to register the system with Red Hat in order to use package management utilities and receive system updates. For more information, refer to the RHEL Product Subscriptions and Entitlements3


    The steps documented below show how to configure a single MongoDB instance. These steps can be used for every MongoDB node in your cluster, whether deploying a single instance or replica set or sharded configuration. Before continuing, confirm that sshd is running, this is how well access the nodes in our cluster once deployed (outside of MongoDB processes).

    $ sudo service sshd status

    If sshd is not running, go ahead and start it up

    $ sudo service sshd start

    Once the system is setup, we can proceed with installing MongoDB. The first step is to add a 10gen repository to yum:

    $ echo [10gen]

    name=10gen Repository


    gpgcheck=0 | sudo tee - a /etc/yum.repos.d/ 10gen.repo

    Next, install the latest version of MongoDB and sysstat utilities:

    $ sudo yum - y install mongo- 10gen- server

    $ sudo yum - y install sysstat

    Now edit the mongod configuration file and set the following properties:

    $ nano /etc/mongod.conf




    Depending on your configuration, you may also want to configure oplogSize in mongod.conf. MongoDB Replica Sets use an operation log (oplog) to store write opera-tions, which are then relayed asynchronously to the secondary nodes in the set. By default, the oplogSize parameter is set to be equivalent to 5% of the disk space. Generally, this is a reasonable start but can be configured over time based on your deployment. For more information, refer to the MongoDB Replica Sets Oplog4 documentation.

    stoRAgE confIguRAtIon

    At this point MongoDB has been installed, but before we start the service we must configure the data storage (dbpath parameter above). There are several options to consider when configuring database storage including volume arrangement, filesystem and encryption.

    Volume Management

    With volumes, you have a choice between using logical volumes (which map to multiple physical volumes) or using a RAID configuration (hardware or software-based). Logical volumes (LVM) allow you to map multiple physical volumes to a single logical device, with options for striping or mirroring data across the physical devices. For more information on LVM refer to the RHEL Logical Volume Manager5 documen-tation. The recommended approach is to use a RAID10 disk configuration, which is a combination of RAID0 (striping) and RAID1 (mirroring). This configuration typically provides the best combination of perfor-mance and reliability. To setup software-based RAID10 storage, install and use the mdadm tools to configure the volumes.


  • 5$ sudo yum install mdadm

    $ sudo mdadm - - create - l10 - n4/dev/md0/ dev/sda*

    This example shows how 4 (- n4) physical volumes attached to the system (e.g. /dev/sda1, /dev/sda2, /dev/sda3, /dev/sda4) can be configured to use RAID10 (- l10). Once mdadm has completed you can create a filesystem and mount the device.


    When using RHEL there are two choices of recom-mended filesystems, ext4 or XFS. Ext4 uses extents, which improves performance when using large files and reduces metadata overhead for large files. In addition, ext4 also labels unallocated block groups and inode table sections accordingly, which allows them to be skipped during a file system check. This makes for quicker file system checks, which becomes more beneficial as the file system grows in size. XFS is a scalable, high performance filesystem created to support extremely large filesystems. XFS supports metadata journaling, which facilitates quicker crash recovery. The XFS file system can also be defrag-mented and enlarged while mounted and active. In addition, RHEL supports backup and restore utilities specific to XFS. Note: RHEL also supports the ext3 filesystem, however it is not recommended for use with MongoDB due to issues with file allocation and large file access.

    To create an ext4 filesystem on a new device (ex. /dev/md0), use the following command:

    $ sudo mkfs.ext4 /dev/md0

    For XFS, use the following:

    $ sudo mkfs.xfs /dev/md0

    After creating the filesystem well add an entry to the filesystem table to use when mounting the device. The following command adds an entry to /etc/fstab for the storage device /dev/md0 and a mount point /data along with information about the filesystem were using (ext4 or XFS) and a number of mount options that are recommended for MongoDB data storage (defaults,auto,noatime,noexec).

    Using ext4:

    $ echo /dev/md0 /data ext4 defaults,auto,noatime,noexec 0 0 | sudo tee - a /etc/fstab

    Or using XFS:

    $ echo /dev/md0 /data xfs defaults,auto,noatime,noexec 0 0 | sudo tee - a /etc/fstab

    With that complete, we can continue setting up the rest of the data storage. First create the mount point /data and then mount the storage device:

    $ sudo mkdir /data

    $ sudo mount /dev/md0

    Set the ownership of the mount point to the MongoDB user (mongod) and group (mongod) that were created during the package installation:

    $ sudo chown mongod:mongod /data

    Now we can start MongoDB, connecting to the system from a local mongo client and save a test document to confirm that storage is functioning as expected:

    $ sudo service mongod start

    $ mongo

    MongoDB shell version: 2.0.4

    connecting to: test


    > db.testCollection.findOne()

    { _id : ObjectId(4f64befde229ee93b9172111), a : 1 }

    storage Encryption

    When considering encryption, RHEL provides two options - a pseudo-filesystem that provides data and filename encryption or whole disk encryption. The key difference between the two options is whether data is protected while in-use or at-rest. The pseudo-filesystem approach encrypts data while in-use by adding a secure layer over the mounted device. While mounted (in use) data on the device is encrypted. Full disk encryption is more appropriate when the desired approach is to maintain disk encryption after the system has been powered off. Each one has different use cases, depending upon your deployment consider-ations one or both may be appropriate.

  • 6The first option requires little setup; by issuing an additional mount command (before starting mongod) the pseudo-filesystem eCryptFs is configured and mounted (configuration occurs upon initial launch):

    $ sudo mount - t ecryptfs /data /data

    Note, the last two arguments (source and destination) are the same, this ensures that any access to /data is encrypted and consistency between secured content is maintained.

    If you are interested in setting up at-rest full disk encryption youll need to perform additional steps. Linux Unified Key Setup-on-disk-format (or LUKS) allows you to encrypt partitions on your system, refer to the RHEL LUKS Disk Encryption6 documen-tation for more information.

    We have completed the setup and installation of MongoDB. The above steps should be repeated on each node in your cluster, regardless of whether you are setting up a replica set or sharded deployment.

    sEcuRIng MongoDB

    With the initial setup complete, we now focus on securing the system. To secure the server, we will utilize two mechanisms built into RHEL, TCP Wrappers and Netfilter. For network services that utilize it, TCP Wrappers add an additional layer of protection by defining which hosts are or are not allowed to connect to wrapped network services. Netfilter, controlled by the iptables administration tool, provides stateful and stateless packet filtering as well as routing and connection state management. Using one or both of these mechanisms can help safeguard your servers. Use of these services is optional but highly recom-mended when deploying production systems.

    To use TCP Wrappers, a service must either be compiled with support for it or be managed by a wrapped process. With MongoDB the latter approach should be used, with the mongod process controlled via xinetd, a super-server daemon that manages networked services such as FTP or Telnet. When a client attempts to connect to a network service controlled by xinetd, the super service receives the request and checks for any TCP Wrappers access

    control rules. For more information on how to configure and use this approach, refer to the RHEL TCP Wrappers and xinetd7 documentation.

    To use Netfilter, well create IP filtering rules using the iptables admin tool. This will allow us to specify port and protocol level access. By default the server is setup to only accept connections via SSH and well be adding rules to cover all of MongoDBs common ports. The specific ports are:

    27017MongoDB core process (mongod)

    27018MongoDB shard server (mongos)

    27019MongoDB config server (mongod)

    28017MongoDB web-based statistics interface (this defaults to the port that mongod uses plus 1000)

    Depending on your deployment, it may be necessary to restrict access to the server and only allow connec-tions from specific addresses. The following commands show an example rule for the mongod core process (port 27017) with additional examples of how to add rules for each of MongoDB ports.

    $ sudo iptables - A INPUT - m state - - state NEW,ESTABLISHED - s 192.168.x.y - p tcp - - dport 27017 - j ACCEPT

    If your deployment will include sharding, then the following examples rules can be used to open up the required ports:

    $ sudo iptables - A INPUT - m state - - state NEW,ESTABLISHED - s 192.168.x.y - p tcp - - dport 27018 - j ACCEPT

    $ sudo iptables - A INPUT - m state - - state NEW,ESTABLISHED - s 192.168.x.y - p tcp - - dport 27019 - j ACCEPT

    Finally, to enable access to the web-based status interface:

    $ sudo iptables - A INPUT - m state - - state NEW,ESTABLISHED - p tcp - - dport 28017 - j ACCEPT

    After the rules have been added, save the iptables rules to persist them to disk:

    $ sudo service iptables save iptables: Saving firewall rules to /etc/sysconfig/iptables:[ OK ]

    6 _Guide/sect-Security _Guide-LUKS_Disk_Encryption.html7 _Guide/sect-Security _Guide-TCP_Wrappers_and_xinetd.html

  • 7The rules weve added allow incoming connections to all ports needed for MongoDB from any incoming address that is in the 192.168.x.y block. Depending upon your cluster configuration, it may be necessary to supply arguments to restrict access from specific sources (for example, - s XX.XX.XX.0/24 only allows access from sources located on the same subnet). In addition to setting the source address when using iptables firewall rules, it is also important to set the bind_ip parameter in the MongoDB configuration file. This will cause MongoDB to only listen on a specific address when accepting incoming connections. The examples above will need to be amended for your deployment, refer to the RHEL Security Guide on IPTables8 for more information.

    DAtABAsE BAckups

    Depending upon the configuration if your server(s), there are multiple ways to backup the data stored in MongoDB. One options is a volume-specific backup, where the database is issued an fsync+lock command, which locks the system from incoming writes, then backups are conducted and the database is unlocked. Note, this approach should only be used with a secondary node within a replica set (or within a replica set inside of a shard) in order to prevent errors in your app when attempting to write data.

    First connect to a secondary server (as noted in the mongo prompt SECONDARY>), switch to the admin database and issue the lock command:

    $ mongo

    MongoDB shell version: 2.0.4

    connecting to: test

    SECONDARY> use admin

    switched to db admin

    SECONDARY> db.fsyncLock()


    info : now locked against writes, use db.fsyncUnlock() to unlock,

    seeAlso : display/DOCS/fsync+Command,

    ok : 1


    While the database is locked, conduct any backups of the volume or the directory where MongoDB stores data files. Once those backup operations are completed, unlock the database so that writes may be accepted once again:

    SECONDARY> db.fsyncUnlock()

    { ok : 1, info : unlock completed }

    If youre data storage was configured to use the XFS filesystem, RHEL includes filesystem-level utilities that can be used for backups. The xfsdump command conducts backups or dumps of active XFS filesystems to another storage device. The command also includes the ability to conduct incremental backups by speci-fying backup levels. For more information, refer to the xfsdump man page9.

    If the data storage was configured using the logical volume manager (LVM) another option is to use LVM snapshots. LVM supports the creation of snapshot volumes from existing logical volumes. This allows you to run backup operations on the snapshot without disturbing the active logical volume. If MongoDB has been configured to use journaling (on by default) and if the journal files are also on the volume being snapshotted, then there is no need to fsync+lock the database before taking an LVM snapshot.

    Another option is to use one of MongoDBs internal tools Mongodump, which can be used to backup specific collections, databases or all data. Paired with a cron job, mongodump can be set to run on a regular basis to conduct full data backups. An example backup script could take the form of the following:


    suffix=$(date +%w)

    mkdir /home/username/backup/mongo- $suffix

    /usr/bin/mongodump - o /home/username/backup/mongo- $suffix

    This script uses the current date in the backup directory name and saves all data from the MongoDB database in the specified directory. To run this script every night at 12am use the crontab utility:

    $ crontab - e

    8 _Guide/sect-Security _Guide-IPTables.html9

  • 8And add the following line:

    0 0 * * * /bin/bash /path/to/backup/

    For more information configuring cron jobs, refer to the crontab10 documentation. Mongodump can be used with single instances, replica sets or sharded config-urations. In the latter two deployment scenarios, there are additional parameters11 required to ensure consistent backups, which are documented on the MongoDB site.

    For more information about backing up data or details about advanced configurations, refer to the MongoDB Backups12 documentation.

    systEM MonItoRIng

    MongoDB and RHEL include several tools to help monitor the performance of your database instances. MongoDB includes mongostat (exposes internal system metrics), a query profiler and diagnostic tools (via the mongo shell). 10gen also offers MongoDB Monitoring Service, which is a free SaaS solution for proactive monitoring of your MongoDB cluster. MMS requires minimal setup and can be deployed onto your cluster quickly. To learn more about MMS, refer to the 10gen MongoDB Monitoring Service13 site. For more information about MongoDB monitoring tools, refer to the MongoDB Monitoring and Diagnostics14 documentation.

    Outside of MongoDB, there are several system utilities that can be used to report system resource usage. For monitoring CPU utilization in predefined intervals you can use sar, which allows you to collect and report system activity information. Similar to sar, iostat provides statistics about CPU usage, devices, parti-tions and network filesystems. For monitoring memory usage, vmstat provides information about processes, memory, paging and block I/O.

    In addition, RHEL also provides dynamic resource management via Tuned and ktune. Tuned is a daemon that monitors the use of system components and dynamically tunes system settings based on that monitoring information. Dynamic tuning accounts for the way that various system components are

    used differently throughout the uptime for any given system. For example, the hard drive is used heavily during startup and login, but is barely used later. Tuned monitors the activity of these components and reacts to changes in their use.

    To get started, first install the Tuned package via yum:

    $ sudo yum install tuned

    Next use the tuned- adm command to activate the throughput-performance tuned profile. This profile adjusts the system settings to optimize for high throughput applications. Specifically, it disables tuned and ktune power saving mechanisms, enables sysctl settings that improve the throughput perfor-mance of your disk and network I/O, and switches to the deadline scheduler. The built-in profiles are also intended to be a starting point; if you are interested in further system optimizations, duplicate the current profile (available in /etc/tune- profiles) and adjust its settings:

    $ sudo cp - a /etc/tune- profiles/throughput- performance /etc/tune- profiles/myprofile

    Inside the newly created profile, the following four files contain system optimization settings. Review and edit these files as needed to optimize the system to your specifications.

    tuned.confThe configuration for the tuned service to be active for this profile.

    sysctl.ktuneThe sysctl settings used by ktune. The format is identical to the /etc/sysconfig/sysctl file (refer to the sysctl and sysctl.conf man pages).

    ktune.sysconfigThe configuration file of ktune itself, typically /etc/sysconfig/ktune.

    ktune.shAn init-style shell script used by the ktune service which can run specific commands during system startup to tune the system.

    10 - ImportExportTools-mongodump12

  • 9Another tool available in RHEL is SystemTap, an appli-cation profiler. It is a tracing and probing tool that allows users to study and monitor the activities of the operating system (particularly, the kernel) in fine detail. It provides information similar to the output of tools like netstat, ps, top, and iostat; however, SystemTap is designed to provide more filtering and analysis options for collected information. It is most useful when other similar tools cannot precisely pinpoint a bottleneck in the system, requiring a deep analysis of system activity.

    To get started with SystemTap, install support for it via yum:

    $ sudo yum install systemtap systemtap- runtime

    The RHEL SystemTap Beginners Guide15 contains links to several useful SystemTap scripts. For MongoDB, possible areas to examine include system networking and disk I/O. The scripts for network profiling16 and summarizing disk traffic17 could be used as a starting point for analyzing system performance. Refer to the SystemTap guides and wiki18 for more information on performance analysis and reporting.


    Red Hat Enterprise Linux provides a robust platform for deploying MongoDB. With features and functionality that emphasize security, reliability and performance, RHEL enables you to deploy MongoDB instances that can meet production needs.

    For more information about MongoDB, see

    For more information about Red Hat Enterprise Linux, see

    15 -nettopsect17 - disktop18

