Case Study on MongoDB

What is MongoDB? Why MongoDB?

Tamim Dalwai
10 min readSep 26, 2021

What is MongoDB?

MongoDB is a document-oriented NoSQL database used for high volume data storage. Instead of using tables and rows as in the traditional relational databases, MongoDB makes use of collections and documents. Documents consist of key-value pairs which are the basic unit of data in MongoDB. Collections contain sets of documents and function which is the equivalent of relational database tables. MongoDB is a database which came into light around the mid-2000s.

MongoDB Features:

  1. Each database contains collections which in turn contains documents. Each document can be different with a varying number of fields. The size and content of each document can be different from each other.
  2. The document structure is more in line with how developers construct their classes and objects in their respective programming languages. Developers will often say that their classes are not rows and columns but have a clear structure with key-value pairs.
  3. The rows (or documents as called in MongoDB) doesn’t need to have a schema defined beforehand. Instead, the fields can be created on the fly.
  4. The data model available within MongoDB allows you to represent hierarchical relationships, to store arrays, and other more complex structures more easily.

Key Components of MongoDB Architecture:

Below are a few of the common terms used in MongoDB

  1. _id — This is a field required in every MongoDB document. The _id field represents a unique value in the MongoDB document. The _id field is like the document’s primary key. If you create a new document without an _id field, MongoDB will automatically create the field. So for example, if we see the example of the above customer table, Mongo DB will add a 24 digit unique identifier to each document in the collection.
  1. Collection — This is a grouping of MongoDB documents. A collection is the equivalent of a table which is created in any other RDMS such as Oracle or MS SQL. A collection exists within a single database. As seen from the introduction collections don’t enforce any sort of structure.
  2. Cursor — This is a pointer to the result set of a query. Clients can iterate through a cursor to retrieve results.
  3. Database — This is a container for collections like in RDMS wherein it is a container for tables. Each database gets its own set of files on the file system. A MongoDB server can store multiple databases.
  4. Document — A record in a MongoDB collection is basically called a document. The document, in turn, will consist of field name and values.
  5. Field — A name-value pair in a document. A document has zero or more fields. Fields are analogous to columns in relational databases.
  6. The following diagram shows an example of Fields with Key value pairs. So in the example below CustomerID and 11 is one of the key value pair’s defined in the document.

8. JSON — This is known as JavaScript Object Notation. This is a human-readable, plain text format for expressing structured data. JSON is currently supported in many programming languages.

Just a quick note on the key difference between the _id field and a normal collection field. The _id field is used to uniquely identify the documents in a collection and is automatically added by MongoDB when the collection is created.

🍃Why Use MongoDB?

Below are the few of the reasons as to why one should start using MongoDB

1.Document-oriented

Since MongoDB is a NoSQL type database, instead of having data in a relational type format, it stores the data in documents. This makes MongoDB very flexible and adaptable to real business world situation and requirements.

2. Ad hoc queries

MongoDB supports searching by field, range queries, and regular expression searches. Queries can be made to return specific fields within documents.

3.Indexing

Indexes can be created to improve the performance of searches within MongoDB. Any field in a MongoDB document can be indexed.

4.Replication

MongoDB can provide high availability with replica sets. A replica set consists of two or more mongo DB instances. Each replica set member may act in the role of the primary or secondary replica at any time. The primary replica is the main server which interacts with the client and performs all the read/write operations. The Secondary replicas maintain a copy of the data of the primary using built-in replication. When a primary replica fails, the replica set automatically switches over to the secondary and then it becomes the primary server.

5.Load balancing

MongoDB uses the concept of sharding to scale horizontally by splitting data across multiple MongoDB instances. MongoDB can run over multiple servers, balancing the load and/or duplicating data to keep the system up and running in case of hardware failure.

CASE STUDY : FORBES

In 1996 Forbes launched its first website.

It was one of the first business publications in the world to do such an innovative thing. The original digital transformation.

In the 25 years since Forbes has only accelerated its efforts and is widely considered to set the standard for digital innovation in the publishing industry. The 100-year-old publisher, famous for its business journalism and rich-lists, has become the largest business media brand in the world. It reaches more than 140 million people worldwide every month, across a number of online and offline channels.

In just six months, Forbes migrated its platform to Google Cloud and MongoDB Atlas. Results include:

  • 58% faster build time for new products and fixes
  • Accelerated release cycle by 4x
  • Reduced total cost of ownership by 25%
  • 28% increase in subscriptions from new newsletters

During the pandemic the cloud infrastructure has also helped the website scale to an extraordinary number of users and helped the team stay nimble, introducing and testing a number of new features.

From June to December last year, traffic continued to grow — setting new records month after month. Then came COVID-19. Like many high-profile publications, Forbes’ coverage of the pandemic has driven a further increase in traffic, reaching record traffic in May with more than 120 million unique visitors.

Despite the turmoil and unprecedented digital traffic it created, the development team continued to build, launch and test industry-leading features such as AI-assisted technology that recommend stories for its journalists and new data analysis tools to better understand reader behavior.

This ease of adaptation was powered by another change which had come just in time. At the end of 2019, Forbes finished the first stage of a comprehensive migration to the cloud. It included moving much of its transactional workload to MongoDB Atlas, the global cloud database service, and the Google Cloud.

We’re very glad we moved to the cloud when we did. Shifting quickly to Google Cloud and MongoDB Atlas put us in a position to innovate and thrive even in the most difficult circumstances.

Vadim Supitskiy, Chief Technology Officer at Forbes

Rewritten Rules

The cloud migration was only the most recent step in Forbes’ 25-year digital transformation. A key element that helped the migration was a change made almost a decade ago. In 2011 the Forbes CMS (Content Management System) was completely re-written to enable a brand new contributor network strategy. The database they chose to build it on: a fledgling NoSQL database called MongoDB.

MongoDB’s document model meant developers could now build new features quickly, easily incorporate changes, and better handle a growing diversity of data types. The CMS was delivered in less than two months. The modernised architecture helped unlock new initiatives, including a massive network of new contributors and live social-media analysis which in turn improved engagement with Forbes’ content.

Transformation Never Ends

The Forbes development team has always served three groups of important users: the readers, the journalists and the advertisers. All three appreciated the changes but they also had an insatiable appetite for better and different solutions. As years went by it became clear that Forbes would need to continue to adapt if it wanted to remain an industry leader. The company needed to further improve the user experience, expand the possibilities for journalists and deliver even better advertising collaborations. It couldn’t deliver that efficiently with the existing architecture (see image 1).

So in 2018 Vadim and his team sketched out their vision: a cloud-native architecture which abstracted away almost all of the management of services. This would make it much easier to scale to handle large volumes of activity.It would also allow developers to build new awesome stuff and do it securely, quickly and with minimal overhead.

“We did not want to be in the database management business,” explained Vadim.

COVID and the Cloud

Jump ahead two years: February of 2020 and the COVID-19 pandemic is the biggest story of a generation and a crisis for almost every business. Forbes had not been idle in those intervening years. Vadim had insisted on an ‘aggressive timeline’. The first stage of the cloud migration had already finished in late 2019 and had taken just six months to complete.

The centerpiece of which was a move to the cloud database service MongoDB Atlas, hosted on Google Cloud. But before they pushed everything live, they did something that not enough companies do: Test, test, test.

It was during that load testing and Quality Assurance (QA) phase that Forbes discovered a critical dependency: There was unacceptably high latency between the datacenter and the cloud. The round trip for data access would have been so slow that the resulting multiplier effect would have created a terrible user experience. To solve this they architected a phased rollout by breaking down the service transfer so that the core applications and databases all moved in one shot.

Once in place, the team used the new infrastructure to create an abstraction layer so that most services don’t even directly touch the database. Instead Forbes makes use of an intermediate service, called the Content API. The API provides a stable API on top of the more fluid data structures hosted within MongoDB Atlas. This uncouples the format of the data from the requirements of the services using it. Services are no longer bound to the data schema. Make a change to one data structure in one place and it doesn’t break anything (or anyone) elsewhere in the stack.

Vadim explained: “We were now abstracted enough to focus solely on value delivery”. Developers no longer had to spend time on maintaining, managing and provisioning infrastructure.

The new infrastructure was immediately put to the test.

First, there was the scale. On top of record numbers in late 2019, the pandemic drove further increases in usage, reaching new record traffic with 121 million users in May. The website and the reader experience never wavered. Next, the business wanted to capitalize on the greater interest and roll out new customer-facing features. Freed to focus on value, the developers helped Forbes launch seven new newsletters which increased the subscription rate by 28%, a key business metric. There were also brand new Forums, new video products and a doubling in size of the breaking news team.

Machine Learning for Journalists

The combination of MongoDB Atlas within Google Cloud’s native microservice architecture would also prove a wise choice. Firstly, Google Cloud’s Kubernetes Engine made it more manageable to orchestrate Forbes’ horde of more than 50 microservices. (Focused, self-contained codebases that allow each service to be easily understood, modified quickly without dependencies on other services, and to be built with the best technology for the task.) Secondly, Atlas was also able to work seamlessly with Google Cloud’s suite of services to build even more powerful tools for its writers.

In April, Forbes introduced a trending story recommendation engine for journalists. The engine scrapes the internet for trending stories and uses Google Cloud’s machine learning to make suggestions to appropriate contributors, either through a Slack bot or through the custom CMS (the CMS itself had been rewritten again in 2019 to become more of an AI and analytics driven platform) . Like everything at Forbes the engine is still undergoing heavy testing but it has proved instantly helpful to Forbes’ editorial team and contributor network.

Entrepreneurial Developers

The Forbes development team is as entrepreneurial as the people that grace its magazine covers or homepage. They know cloud migrations and innovations are only as great as the business results they generate.

Forbes’ cloud migration led to a 58% improvement in build speed and the release cycle has improved 2x to 10x (depending on the service). Other efficiencies that came with the migration resulted in a 25% reduction in total cost of ownership.

Forbes is already looking ahead to the next era of publishing. The new cloud infrastructure will be immediately tasked with enabling improved personalization, loyalty and managing first party data.

“Digital transformation is never finished,” Vadim points out. Considering they’re 25 years into this, it’s hard to disagree.

Thank You …

--

--