Can u Imagine ? How big companies managing there Huge amount of data Per day i.e Big Data

7 min readSep 18, 2020

In this Article we are going to discuss ,

what is Date & Big data
Problem facing by big companies to store huge amount of data which is coming through various medium
How these company managing and storing there data
which mindset or Architecture they are using
Solution this Problem

Data

Data are measured, collected and reported, and analyzed, whereupon it can be visualized using graphs, images or other analysis tools. Data as a general concept refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing.

The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.

Big Data

To really understand big data, it’s helpful to have some historical background. Big data is data that contains greater variety arriving in increasing volumes and with ever-higher velocity. This is known as the three Vs.

Put simply, big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before

Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.

Importance of big data

Companies use the big data accumulated in their systems to improve operations, provide better customer service, create personalized marketing campaigns based on specific customer preferences and, ultimately, increase profitability. Businesses that utilize big data hold a potential competitive advantage over those that don’t since they’re able to make faster and more informed business decisions, provided they use the data effectively.

For example, big data can provide companies with valuable insights into their customers that can be used to refine marketing campaigns and techniques in order to increase customer engagement and conversion rates.

How much data is produced every day?

The amount of data is growing exponentially. Today, our best estimates suggest that at least 2.5 quintillion bytes of data is produced every day (that’s 2.5 followed by a staggering 18 zeros!). As the infographic points out, that’s everything from data collected by the Curiosity Rover on Mars, to your Facebook photos from your latest vacation

Social Media and Big Data

With the advent and increased use of the internet, social media has become an integral part of people’s daily routine. Social media is not only used to connect with others, but it has become an effective platform for businesses to reach their target audience. With the emergence of big data, social media marketing has reached an altogether new level. It is estimated that by 2020 the accumulated volume of big data will reach 44 trillion gigabytes. With such an enormous amount of data available, marketers are able to utilize it to get actionable insights for framing efficient social media marketing strategies.

All the status updates, photos and videos posted by users on their social network contain useful information about their demographics, likes, dislikes, etc. Businesses are utilizing this information in numerous ways, managing and analyzing it to get a competitive edge. Big data is used by marketers to plan for future social media campaigns by learning everything they need to know about their potential customers and approaching them. This post will throw light on the application of big data on social media marketing, examining its current as well as future impact.

Facebook — 2.5 Billion Pieces Of Content And 500+ Terabytes Ingested Every Day

The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.

Facebook revealed some big, big stats on big data to a few reporters at its HQ today, including that its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. Plus it gave the first details on its new “Project Prism

Big data also encompasses a wide variety of data types, including the following

structured data in databases and data warehouses based on Structured Query Language (SQL);
unstructured data, such as text and document files held in Hadoop clusters or NoSQL database systems; and
semistructured data, such as web server logs or streaming data from sensors.

All of the various data types can be stored together in a data lake, which typically is based on Hadoop or a cloud object storage service. In addition, big data applications often include multiple data sources that may not otherwise be integrated. For example, a big data analytics project may attempt to gauge a product’s success and future sales by correlating past sales data, return data and online buyer review data for that product.

Velocity refers to the speed at which big data is generated and must be processed and analyzed. In many cases, sets of big data are updated on a real- or near-real-time basis, instead of the daily, weekly or monthly updates made in many traditional data warehouses. Big data analytics applications ingest, correlate and analyze the incoming data and then render an answer or result based on an overarching query. This means data scientists and other data analysts must have a detailed understanding of the available data and possess some sense of what answers they’re looking for to make sure the information they get is valid and up to date.

Managing data velocity is also important as big data analysis expands into fields like machine learning and artificial intelligence (AI), where analytical processes automatically find patterns in the collected data and use them to generate insights.

How big data is stored and processed

The need to handle big data velocity imposes unique demands on the underlying compute infrastructure. The computing power required to quickly process huge volumes and varieties of data can overwhelm a single server or server cluster. Organizations must apply adequate processing capacity to big data tasks in order to achieve the required velocity. This can potentially demand hundreds or thousands of servers that can distribute the processing work and operate collaboratively in a clustered architecture, often based on technologies like Hadoop and Apache Spark.

Achieving such velocity in a cost-effective manner is also a challenge. Many enterprise leaders are reticent to invest in an extensive server and storage infrastructure to support big data workloads, particularly ones that don’t run 24/7. As a result, public cloud computing is now a primary vehicle for hosting big data systems. A public cloud provider can store petabytes of data and scale up the required number of servers just long enough to complete a big data analytics project. The business only pays for the storage and compute time actually used, and the cloud instances can be turned off until they’re needed again.

To improve service levels even further, public cloud providers offer big data capabilities through managed services that include the following:

Amazon EMR (formerly Elastic MapReduce)
Microsoft Azure HDInsight
Google Cloud Dataproc

In cloud environments, big data can be stored in the following:

Hadoop Distributed File System (HDFS);
lower-cost cloud object storage, such as Amazon Simple Storage Service (S3);
NoSQL databases; and
relational databases.

For organizations that want to deploy on-premises big data systems, commonly used Apache open source technologies in addition to Hadoop and Spark include the following:

YARN, Hadoop’s built-in resource manager and job scheduler, which stands for Yet Another Resource Negotiator but is commonly known by the acronym alone;
the MapReduce programming framework, also a core component of Hadoop;
Kafka, an application-to-application messaging and data streaming platform;
the HBase database; and
SQL-on-Hadoop query engines, like Drill, Hive, Impala and Presto.

Users can install the open source versions of the technologies themselves or turn to commercial big data platforms offered by Cloudera, which merged with former rival Hortonworks in January 2019, or Hewlett Packard Enterprise (HPE), which bought the assets of big data vendor MapR Technologies in August 2019. The Cloudera and MapR platforms are also supported in the cloud.