Table of Contents
What is Big Data?
We all use electronic devices like smartphones and laptops/computers. But we hardly think about the amount of data generated by such devices in the form of files, music, pictures, videos, and many more. We can estimate the amount of data generated per day by the fact that, according to Social Media Today, around 2.5 quintillion bytes of data is produced every day. But the count doesn’t stop here. According to Racounter, by the year 2025, we humans will generate around 463 exabytes of data each day.
Massive, Isn’t it?
This massive amount of data is known as Big Data. Big Data is the collection of an enormous volume of data growing mountingly over time that cannot be efficiently processed or stored by traditional data management or computing tools.
Check more about Data Science.
Data or Big Data?
Any data is Big data if it has Volume, Velocity, Variety, Veracity, and Value.
- Volume: Here, volume refers to the amount of data generated based on that; the data can be categorized as Data or Big Data.
- Velocity: Here, velocity refers to the speed of the data generation.
- Variety: Here, variety refers to the types of data available as the data can be in structured, unstructured, or semi-structured form depending upon its generation.
- Veracity: Here, veracity refers to the accuracy and trustworthiness of the data.
- Value: Here, value refers to the effective use of data.
If the data has all the qualities mentioned above, it can be considered as Big Data.
What are the Types of Big Data?
Big Data can be in Structured form, Unstructured form, or Semi-Structured form.
Structured Data: Structured Data is the data that is already orderly stored in the databases that are most useful in computer-related activities like programming. It may be either machine-generated or human-generated.
Data collected from medical devices, web server logs, online transaction processing platforms, sensors, spreadsheets, and SQL databases are the notable sources of Structured Data. Structured data only accounts for 20% of data, but its high degree of performance and structure make it the Big Data’s foundation.
Unstructured Data: Unlike structured data, unstructured data uses no exact format for storage. Unstructured data accounts for the remaining 80% of the total data. Like structured data, it classifies under machine-generated and human-generated data.
Data involving web pages, images, videos, audios, reports, surveys, word documents, and PowerPoint presentations are notable sources of unstructured data.
Semi-Structured Data: Data that is not suitable for any data model but has some structure is Semi-Structured Data.
NoSQL data falls into semi-structured data because it contains keywords that may be useful for the document’s easy processing.
Data from e-mails, zipped files, markup languages, TCP/IP packets, binary executables are notable sources of semi-structured data.
How Storage and Analytics of Big Data takes place?
For efficient storage and analytics, Big Data uses multiple frameworks like Cassandra, Hadoop, and Spark.
- Cassandra: Cassandra is an open-source NoSQL database that handles a large amount of data across various commodity servers that provides high availability without any point of failure. It is based on Amazon Dynamo and Google Big Table. It offers massive scalability across nodes in more than one data centre to increase data availability. It is always up, always-on, & delivers very consistent.
- Hadoop: Hadoop, an open-source framework is developed in 2006 by Doug cutting, and the Apache Software Foundation manages it. It stores and processes a massive volume of data efficiently. It comprises two main components – the Hadoop Distributed File System(HDFS) and MapReduce.
Hadoop Distributed File System is accountable for storing and managing the Hadoop Cluster data, whereas MapReduce is responsible for processing and computing data present in the Hadoop Distributed File System. - Apache Spark: Apache Spark is a fast data processing engine that allows data operators to execute machine learning or SQL workloads efficiently that require immediate iterative access to datasets. It can schedule and distribute applications of many computational tasks across many spark working machines.
What are the applications of Big Data?
- Weather Forecasting: Big Data helps to collect a massive amount of data, including weather reports of previous times, climate change details, wind direction, precipitation level, and details regarding other necessary factors that can help in weather forecasting. Big Data helps analyze information about various aspects to produce accurate weather or natural calamities predictions.
- Media and Entertainment: Different media and entertainment companies collect information by analyzing our browsing history, cookies, purchase details, and displaying advertisements accordingly. It is possible because of big data.
- Health Care: Medical researches and discoveries in a mean efficient time are possible due to Big Data. Big Data helps analyze the patients’ medical histories and medical records for personalized treatment. Big Data also helps in detecting medical frauds and identity threats.
- Logistics: Big data facilitates the transportation and storage of goods. Big data analyzes logistics’ critical factors to determine flexible routes, capacity planning, efficient warehousing, and consumer satisfaction.
- Tourism: Big data analyzes the tourism sector’s critical factors like occupancy rates, hotels, tariffs, peak seasons to facilitate revenue management, market research, personalized offers, and investment opportunities.
Recommended video on Big Data
What are your views on Big Data? Have any other applications of Big Data in mind? Have any interesting facts to share relating to the topic? Feel free to share with us in the comment section.