“ How Facebook is managing and manipulating Big Data ”

Tamim Dalwai
4 min readSep 17, 2020

What is Big Data ?

Let’s first understand what is mean by data

The quantities , characters or symbols on which operations are performed by a computer, which may be stored & transmitted in the form of electrical signals & recorded on magnetic optical or mechanical recording media is DATA.

Then What is Big Data ?

Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.

What are problems Associated with Big Data ?

A) Volume : the amount of data that businesses can collect is really enormous and hence the volume of the data becomes a critical factor in Big Data analytics.

B) Velocity: the rate at which new data is being generated all thanks to our dependence on the internet, sensors, machine-to-machine data is also important to parse Big Data in a timely manner.

C) Variety:the data that is generated is completely heterogeneous in the sense that it could be in various formats like video, text, database, numeric, sensor data and so on and hence understanding the type of Big Data is a key factor to unlocking its value.

D) Veracity: knowing whether the data that is available is coming from a credible source is of utmost importance before deciphering and implementing Big Data for business needs

How Facebook manages Big Data ?

We all know that facebook is 1st most popular social media platform in the world.Facebook has billions of users.Facebook provides many features for users like to create free account, create and upload as many as posts you want.

Do you know facebook system processes 2.5 billion pieces of content & 500+ terabytes ( 1 terabytes = 1000 gigabytes ) of data each day.It’s pulling in 2.7 billion Like actions & 300 million photos per day.It scans roughly 105 terabytes of data each half hour.

Seems massive data !!! So how facebook is managing such a huge data ???

Facebook run’s the world’s largest Hadoop cluster ”, says Jay Parikh (Vice President Infrastructure Engineering, Facebook)

Basically facebook runs the biggest hadoop cluster that goes beyond 4000 machines & storing more than hundreds of millions of gigabytes.

Hadoop provides a common infrastructure for Facebook with efficiency and reliability. Beginning with searching, log processing, recommendation system, and data warehousing, to video and image analysis, Hadoop is empowering this social networking platform in each and every way possible.

Let’s explore more about HADOOP

Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

Hadoop is a miracle for the large MNCs to small startups. Hadoop helps businesses store and process massive amounts of data without purchasing expensive hardware.It has changed the life of big giants and enables a large ecosystem of solution providers such as log processing, recommendation systems, data warehousing, fraud detection, etc

Some Hadoop users

Yahoo is using Hadoop for content optimization, search index, ads optimization and content feed processing.

New York Times uses Hadoop to make PDF file from published articles.

Many popular E-Commerce companies are using Hadoop to track user behaviour.

Hadoop is also used in other departments like customer segmentation and experience analysis, credit risk assessment, targeted services, etc.

Conclusion

Hadoop is answer to the traditional storage & computing problem. Traditional Database management systems are not scalable, not fault tolerant and become very slow in fetching record as the data size increases.

--

--