❗️Tasks❗️
🔥Let’s research and the world the know about the Myths of Hadoop🔥
Task 4.1 :- Individual/Team task:
🔷In a Hadoop cluster, find how to contribute limited/specific amount of storage as slave to the cluster?
Before performing the task let’s discuss about Hadoop and its architecture
Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model
Hadoop cluster have the data node and name node
NameNode works as Master in Hadoop cluster. Below listed are the main function performed by NameNode:
1. Stores metadata of actual data. E.g. Filename, Path, No. of Data Blocks, Block IDs, Block Location, No. of Replicas, Slave related configuration
2. Manages File system namespace.
3. Regulates client access request for actual file data file.
4. Assign work to Slaves(DataNode).
5. Executes file system name space operation like opening/closing files, renaming files and directories.
.
DataNode works as Slave in Hadoop cluster . Below listed are the main function performed by DataNode:
- Actually stores Business data.
2. This is actual worker node were Read/Write/Data processing is handled.
3. Upon instruction from Master, it performs creation/replication/deletion of data blocks.
4. As all the Business data is stored on DataNode, the huge amount of storage is required for its operation. Commodity hardware can be used for hosting DataNode.
Now coming to the task
To perform this task my team setup complete Hadoop cluster in cloud Cluster in AWS cloud with 1 Namenode /master and 4 data node / slave and everyone connected in secure system
After connecting to the name node they share there storage to master node and complete storage is share.
But now i want they don’t share there complete storage i want they limit there storage to be share to the Name Node
🔥Let’s research and the world the know about the Myths of Hadoop🔥
🔷In a Hadoop cluster, find how to contribute limited/specific amount of storage as slave to the cluster?
“ Now , As they are sharing there root device lets complete 50 Gib but its not good to put data on root device because in root device our OS / instance is running , if OS goes gown we also loose our complete data “
To overcome with this issue and make our data permanent or persistent we add extra device like pen drive where we store our data in AWS this is EBS volume so we attach one extra volume to our running instance
Lets do this in practice way
Right Now you see in my AWS 1- name node and 1- data node is running with connected cluster and u see my data node is sharing complete drive
Now attach one EBS in the instance
I have created 30 Gib size EBS and attach to the Data node
Now as we know if we attach any drive we have to follow certain step
- create partition
- format it
- mount it
now here you have 30 Gib space you can limit the space you want to give to name node
let I want to give space of 10 GiB
Format it
Mount it
Now start the datanode
using this approach we can limit the storage
thank you
feel free to ask every bit
connect on linkedin- linkedin.com/in/aditya-kumar-soni-91370b189