MongoDB (from „humongous“) is a cross-platform document-oriented database. Classified as a NoSQL database, MongoDB eschews the traditional table-based relational databasestructure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster. Released under a combination of the GNU Affero General Public License and the Apache License, MongoDB is free and open-source software.
First developed by the software company 10gen (now MongoDB Inc.) in October 2007 as a component of a planned platform as a service product, the company shifted to an open source development model in 2009, with 10gen offering commercial support and other services. Since then, MongoDB has been adopted as backend software by a number of major websites and services, including Craigslist, eBay, Foursquare, SourceForge, Viacom, and the New York Times, among others. MongoDB is the most popular NoSQL database system.
HOW DOES MONGODB WORK?
MongoDB stores data using a flexible document data model that is similar to JSON. Documents contain one or more fields, including arrays, binary data and sub-documents. Fields can vary from document to document. This flexibility allows development teams to evolve the data model rapidly as their application requirements change.
Developers access documents through rich, idiomatic drivers available in all popular programming languages. Documents map naturally to the objects in modern languages, which allows developers to be extremely productive. Typically, there’s no need for an ORM layer.
MongoDB provides auto-sharding for horizontal scale out. Native replication and automatic leader election supports high availability across racks and data centers. And MongoDB makes extensive use of RAM, providing in-memory speed and on-disk capacity.
Unlike most NoSQL databases, MongoDB provides comprehensive secondary indexes, including geospatial and text search, as well as extensive security and aggregation capabilities. MongoDB provides the features you need to develop the majority of the new applications your organization develops today.
What is MongoDB?
MongoDB is the database for today’s applications: innovative, fast time-to-market, globally scalable, reliable, and inexpensive to operate.
With MongoDB, you can build applications that were never possible with traditional relational databases. Here’s how.
- Fast, Iterative Development. Scope creep and changing business requirements no longer stand between you and successful project delivery. A flexible data model coupled with dynamic schema and idiomatic drivers make it fast for developers to build and evolve applications. Automated provisioning and management enable continuous integration and highly productive operations. Contrast this against static relational schemas and complex operations that have hindered you in the past.
- Flexible Data Model. MongoDB’s document data model makes it easy for you to store and combine data of any structure, without giving up sophisticated data access and rich indexing functionality. You can dynamically modify the schema without downtime. You spend less time prepping your data for the database, and more time putting your data to work.
- Multi-Datacenter Scalability. MongoDB can be scaled within and across multiple distributed data centers, providing new levels of availability and scalability. As your deployments grow in terms of data volume and throughput, MongoDB scales easily with no downtime, and without changing your application. And as your availability and recovery goals evolve, MongoDB lets you adapt flexibly, across data centers, with tunable consistency.
- Integrated Feature Set. Analytics, text search, geospatial, in-memory performance and global replication allow you to deliver a wide variety of real-time applications on one technology, reliably and securely. RDBMS systems require additional, complex technologies demanding separate integration overhead and expense to do this well.
- Lower TCO. Application development teams are more productive when they use MongoDB. Single click management means operations teams are as well. MongoDB runs on commodity hardware, dramatically lowering costs. Finally, MongoDB offers affordable annual subscriptions, including 24×365 global support. Your applications can be one tenth the cost to deliver compared to using a relational database.
- Long-Term Commitment. MongoDB Inc and the MongoDB ecosystem stand behind the world’s fastest-growing database. 8M+ downloads. 1,000+ customers including 30 of the Fortune 100. Over 650 partners. Greater venture capital funding than any other database in history. You can be sure your investment is protected.
Want to go deeper into MongoDB’s technology? Then read on for key highlights, or download our detailed Architecture Guide.
- MongoDB Data Model
- MongoDB Query Model
- MongoDB Data Management
- MongoDB Consistency & Availability
- Management and Operations
MongoDB Data Model
This section covers 2 topics: Data as Documents and Dynamic Schemas.
DATA AS DOCUMENTS
MongoDB stores data as documents in a binary representation called BSON (Binary JSON). Documents that share a similar structure are typically organized as collections. You can think of collections as being analogous to a table in a relational database: documents are similar to rows, and fields are similar to columns.
MongoDB documents tend to have all data for a given record in a single document, whereas in a relational database information for a given record is usually spread across many tables.
For example, consider the data model for a blogging application. In a relational database, the data model would comprise multiple tables such as Categories, Tags, Users, Comments and Articles. In MongoDB the data could be modeled as two collections, one for users, and the other for articles. In each blog document there might be multiple comments, multiple tags, and multiple categories, each expressed as an embedded array.
“Data as documents: simpler for developers, faster for users.”
As a result of the document model, data in MongoDB is more localized, which dramatically reduces the need to JOIN separate tables. The result is dramatically higher performance and scalability across commodity hardware as a single read to the database can retrieve the entire document.
In addition, MongoDB documents are more closely aligned to the structure of objects in the programming language. This makes it simpler and faster for developers to model how data in the application will map to data stored in the database.
MongoDB documents can vary in structure. For example, all documents that describe users might contain the user id and the last date they logged into the system, but only some of these documents might contain the user’s identity for one or more third-party applications.
Fields can vary from document to document; there is no need to declare the structure of documents to the system – documents are self-describing. If a new field needs to be added to a document then the field can be created without affecting all other documents in the system, without updating a central system catalog, and without taking the system offline.
“MongoDB enables developers to design and evolve the schema through an iterative and agile approach.”
Developers can start writing code and persist the objects as they are created. And when developers add more features, MongoDB continues to store the updated objects without the need for performing costly ALTER_TABLE operations, or worse – having to re-design the schema from scratch.
How does the MongoDB data model stack up to relational databases and key-value stores? Take a look at the chart below:
|Rich Data Model||Yes||No||No|
|Easy for Programmers||Yes||No||Not When Modelling Complex Data Structures|
MongoDB Query Model
This section covers 3 topics: Idiomatic Drivers, Query Types, and Indexing.
“With the intuitive document data model, dynamic schema and idiomatic drivers, you can build applications and get to market faster with MongoDB.”
MongoDB supports many types of queries for highly scalable operational and analytic applications. A query may return a document or a subset of specific fields within the document:
- Key-value queries return results based on any field in the document, often the primary key.
- Range queries return results based on values defined as inequalities (e.g. greater than, less than or equal to, between).
- Geospatial queries return results based on proximity criteria, intersection and inclusion as specified by a point, line, circle or polygon.
- Text Search queries return results in relevance order based on text arguments using Boolean operators (e.g., AND, OR, NOT).
- Aggregation Framework queries return aggregations of values returned by the query (e.g., count, min, max, average, similar to a SQL GROUP BY statement).
“Unlike other NoSQL databases, MongoDB is not limited to simple key-value operations. You can build rich applications using complex queries and secondary indexes that unlock the value in structured, semi-structured, and unstructured data.
Native analytics, text search and geospatial features with tunable consistency and in-memory performance allow you to deliver a wide variety of real-time applications on one technology, reliably and securely.”
Indexes are a crucial mechanism for optimizing system performance and scalability while providing flexible access to your data. MongoDB includes support for many types of secondary indexes that can be declared on any field in the document, including fields within arrays:
- You can define compound, unique, array, TTL, geospatial, sparse, hash and text indexes to optimize for multiple query patterns, multi-structured data types and constraints.
- The MongoDB query optimizer analyzes queries and automatically selects the most efficient execution plan. Developers can review and optimize plans using the powerful explain method and index filters.
- Index intersection enables MongoDB to use more than one index to optimize an ad-hoc query at run-time.
How does the MongoDB query and indexing model stack up to relational databases and key-value stores? Take a look at the chart below:
|Text Search||Yes||Expensive Add-On||No|
To learn more about the differences in data models, download our Relational Database to MongoDB Migration Guide
MongoDB Data Management
AUTO-SHARDING FOR LINEAR SCALABILITY
MongoDB provides horizontal scale-out for databases on low cost, commodity hardware using a technique called sharding, which is transparent to applications. Sharding distributes data across multiple physical partitions called shards. Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application. MongoDB automatically balances the data in the cluster as the data grows or the size of the cluster increases or decreases.
“Sharding is transparent to applications; whether there is one or one hundred shards, the application code for querying MongoDB is the same.”
Unlike relational databases, sharding is automatic and built into the database. Developers don’t face the complexity of building sharding logic into their application code, which then needs to be updated as shards are migrated. Operations teams don’t need to deploy additional clustering software to manage process and data distribution.
Unlike other NoSQL databases, you have multiple sharding policies available – hash-based, range-based and location-based – that enable you to distribute your data across a cluster according to query patterns or data locality. As a result, you get much higher scalability across a diverse set of workloads:
- Range-based Sharding. Documents are partitioned across shards according to the shard key value. Documents with shard key values close to one another are likely to be co-located on the same shard. This approach is well suited for applications that need to optimize range based queries.
- Hash-based Sharding. Documents are distributed according to an MD5 hash of the shard key value. This approach guarantees a uniform distribution of writes across shards, but is less optimal for range-based queries.
- Location-based Sharding. Documents are partitioned according to a user-specified configuration that associates shard key ranges with specific shards and hardware. Users can continuously refine the physical location of documents for application requirements such as locating data in specific data centers and multi-temperature storage.
How do the MongoDB scaling capabilities stack up to relational databases and key-value stores? Take a look at the chart below:
|Scale-Out on Commodity Hardware||Yes||No||Yes|
|Shard by Hash||Yes||Manual||Yes|
|Shard by Range||Yes||Manual||No|
|Shard by Location||Yes||Manual||No|
|Automatic Data Rebalancing||Yes||Manual||Limited|
MongoDB scales like crazy. Whether you are sharding to scale data volume, performance or cross-data center operations, you can do it with MongoDB.
MongoDB Consistency & Availability
This section covers 4 topics: Transaction Model, Replica Sets, In-Memory Performance, and Security.
MongoDB provides ACID properties at the document level. One or more fields may be written in a single operation, including updates to multiple sub-documents and elements of an array. The ACID guarantees provided by MongoDB ensure complete isolation as a document is updated; any errors cause the operation to roll back so that clients receive a consistent view of the document.
Developers can use MongoDB’s Write Concerns to configure operations to commit to the application only after they have been flushed to the journal file on disk. This is the same model used by many traditional relational databases to provide durability guarantees. As a distributed system, MongoDB presents additional flexibility that helps users to achieve their desired availability SLAs. Each query can specify the appropriate write concern, such as writing to at least two replicas in one data center and one replica in a second data center.
MongoDB maintains multiple copies of data called replica sets using native replication. A replica set is a fully self-healing shard that helps prevent database downtime. Replica failover is fully automated, eliminating the need for administrators to intervene manually.
The number of replicas in a MongoDB replica set is configurable: a larger number of replicas provide increased data availability and protection against database downtime (e.g., in case of multiple machine failures, rack failures, data center failures, or network partitions). Optionally, operations can be configured to write to multiple replicas before returning to the application, thereby providing functionality that is similar to synchronous replication.
“MongoDB replica sets deliver fault tolerance and disaster recovery. Multi-data center awareness enables global data distribution and separation between operational and analytical workloads.
Replica sets also provide operational flexibility by providing a way to upgrade hardware and software without requiring the database to go offline.”
IN-MEMORY PERFORMANCE WITH ON-DISK CAPACITY
MongoDB makes extensive use of RAM to speed up database operations. Reading data from memory is approximately 100,000 times faster than reading data from disk. In MongoDB, all data is read and manipulated through memory-mapped files. Data that is not accessed is not loaded into RAM. Because MongoDB provides in-memory performance, for most applications there is no need for a separate caching layer to scale your database.
To learn more, download our detailed Architecture Guide.
“Data security and privacy is a critical concern in today’s connected world. Data analyzed from new sources such as social media, logs, mobile devices and sensor networks has become as sensitive as traditional transaction data generated by back-office systems.
MongoDB Enterprise features extensive capabilities to defend, detect and control access to data.”
- Authentication. Simplifying access control to the database, MongoDB offers integration with external security mechanisms including LDAP, Windows Active Directory, Kerberos and x.509 certificates.
- Authorization. User-defined roles enable administrators to configure granular permissions for a user or application, based on the privileges they need to do their job. Additionally, field-level redaction can work with trusted middleware to manage access to individual fields within a document, allowing the co-location of data with multiple security levels for ease of development and operation.
- Auditing. For regulatory compliance, security administrators can use MongoDB’s native audit log to track access and administrative actions taken against the database.
- Encryption. MongoDB data can be encrypted on the network and on disk. Support for SSL allows clients to connect to MongoDB over an encrypted channel.
To learn more, download our MongoDB Security Reference Architecture
MANAGEMENT & OPERATIONS
This section covers 6 topics: MMS, Deployments and Upgrades, Monitoring, Disaster Recovery, Integration, and Cost Savings.
MongoDB Management Service (MMS) is the platform for managing MongoDB, created by the engineers who develop the database. Available as a managed service in the cloud or as an on-prem deployment with MongoDB Enterprise, MMS provides an integrated suite of applications that manage the complete lifecycle of the database:
- Automated provisioning and management with a single click and zero-downtime upgrades;
- Proactive monitoring providing visibility into the performance of MongoDB, history, and automated alerting on 100+ system metrics;
- Disaster recovery with continuous, incremental backup and point-in-time recovery.
Each of these is explained in more detail below.
DEPLOYMENTS AND UPGRADES
MMS helps operations teams deploy MongoDB through a powerful self-service portal. The deployment could be a single instance, a replica set or a sharded cluster, in the public cloud or in your private data center. MMS enables fast deployment on any hosting topology.
MMS self-service portal: simple, intuitive and powerful. Provision and upgrade entire clusters with a single click.
In addition to initial deployment, MMS enables capacity to be dynamically scaled by adding shards and replica set members to running systems. Other maintenance tasks such upgrades or resizing the oplog can all be made with a few clicks and zero downtime.
MMS gives developers, administrators and operations teams visibility into the MongoDB service. Featuring charts, custom dashboards, and automated alerting, MMS tracks 100+ key database and systems health metrics including operations counters, memory and CPU utilization, replication status, open connections, queues and any node status.
The metrics are securely reported to MMS where they are processed, aggregated, alerted and visualized in a browser, letting Administrators easily determine the health of MongoDB in real-time. Historic performance can be reviewed in order to create operational baselines and capacity planning for further scale. Integration with existing monitoring tools is also straightforward via the MMS API.
“MMS provides real time & historic visibility into MongoDB with integration into operational tools”
“Alerts enable proactive management of MongoDB”
A backup and recovery strategy is necessary to protect your mission critical data against catastrophic failure, such as a fire or flood in your data center, or human error, such as unintentional corruption due to mistakes in application code, or accidental deletion of data. With a backup and recovery strategy in place, administrators can restore business operations with minimal data loss and the organization can meet regulatory and compliance requirements.
MMS is the only backup solution for MongoDB with continuous incremental backup, point-in-time recovery of replica sets, and consistent snapshots of sharded clusters. MMS creates snapshots of MongoDB data and retains multiple copies based on a user-defined retention policy.
How do the MongoDB operational capabilities stack up to relational databases and key-value stores? Take a look at the chart below:
|Self Healing Recovery with Automatic Failover||Yes||Often Requires Additional Clustering Software||No: Manual Failover Often Recommended|
|Separate Caching Layer Required||No||Often||No|
|Data Center Awareness||Yes||Expensive Add-On||No|
|Continuous Backup & Point in Time Recovery||Yes||Yes||No|
|API Integration with Systems Management Frameworks||Yes||Yes||No|
INTEGRATING MONGODB WITH EXTERNAL MONITORING SOLUTIONS
The MMS API provides programmatic access to key monitoring data and access to MMS’s features by external management tools.
In addition to MMS, MongoDB Enterprise can report system information to SNMP traps, supporting centralized data collection and aggregation via external monitoring solutions.
To learn more about operational best practices, download our Operations Guide
MongoDB can be 1/10th the cost to build and run, compared to a relational database. The cost advantage is driven by:
- MongoDB’s increased ease of use and developer flexibility, which reduces the cost of developing and operating an application;
- MongoDB’s ability to scale on commodity server hardware and storage;
- MongoDB’s substantially lower prices for commercial licensing, advanced features and support.
Furthermore, MongoDB’s technical and cost-related benefits translate to topline advantages as well, such as faster time-to-market and time-to-scale.
To learn more, download our TCO comparison of Oracle and MongoDB
Want to go deeper into MongoDB’s technology? Then download our detailed Architecture Guide.
Read a White Paper
Learn the best practices for moving from an RDBMS to MongoDB.
Read a Datasheet
Download the detailed MongoDB Architecture guide.