❗️Tasks❗️

🔥Let’s research and the world the know about the Myths of Hadoop🔥

Task 4.1 :- Individual/Team task:

🔷In a Hadoop cluster, find how to contribute limited/specific amount of storage as slave to the cluster?

Before performing the task let’s discuss about Hadoop and its architecture

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model

Hadoop cluster have the data node and name node

NameNode works as Master in Hadoop cluster. Below listed are the main function performed by NameNode:

1. Stores metadata of actual data. E.g. Filename, Path, No. of Data Blocks, Block IDs, Block Location, No. of Replicas, Slave related configuration
2. Manages File system namespace.
3. Regulates client access request for actual file data file.
4. Assign work to Slaves(DataNode).
5. Executes file system name space operation like opening/closing files, renaming files and directories.
.

DataNode works as Slave in Hadoop cluster . Below listed are the main function performed by DataNode:

  1. Actually stores Business data.
    2. This is actual worker node were Read/Write/Data processing is handled.
    3. Upon instruction from Master, it performs creation/replication/deletion of data blocks.
    4. As all the Business data is stored on DataNode, the huge amount of storage is required for its operation. Commodity hardware can be used for hosting DataNode.

Now coming to the task

To perform this task my team setup complete Hadoop cluster in cloud Cluster in AWS cloud with 1 Namenode /master and 4 data node / slave and everyone connected in secure system

After connecting to the name node they share there storage to master node and complete storage is share.

But now i want they don’t share there complete storage i want they limit there storage to be share to the Name Node

🔥Let’s research and the world the know about the Myths of Hadoop🔥

🔷In a Hadoop cluster, find how to contribute limited/specific amount of storage as slave to the cluster?

“ Now , As they are sharing there root device lets complete 50 Gib but its not good to put data on root device because in root device our OS / instance is running , if OS goes gown we also loose our complete data “

To overcome with this issue and make our data permanent or persistent we add extra device like pen drive where we store our data in AWS this is EBS volume so we attach one extra volume to our running instance

Lets do this in practice way

Right Now you see in my AWS 1- name node and 1- data node is running with connected cluster and u see my data node is sharing complete drive

Now attach one EBS in the instance

I have created 30 Gib size EBS and attach to the Data node

Now as we know if we attach any drive we have to follow certain step

  • create partition
  • format it
  • mount it

now here you have 30 Gib space you can limit the space you want to give to name node

let I want to give space of 10 GiB

Format it

Mount it

Now start the datanode

using this approach we can limit the storage

thank you

feel free to ask every bit

connect on linkedin- linkedin.com/in/aditya-kumar-soni-91370b189

Arth2020 | student