Can we integrate LVM with Hadoop??(making our cluster elastic)
It is an interesting concept to make storage contributed by the different data nodes be customizable and elastic . That means now it is dependent on the user of the data node to allow how much data will be contributed to the cluster and he/she will be able to increase or decrease the size on the go.
Challenges faced in the contemporary way:-
As we know whenever we share any directory with the cluster it shares whole of the storage which is free in the system. Now consider the data node be a individual system which has contributed its own storage to the cluster and has to do its individual task on daily basis. On one day there is high load on the cluster and user also has to perform the task on the data having high storage but he will not be able to do it . So there is need of limiting the storage that is shared to the cluster. So there is interesting task come in play.
Task Description
🌀 Elasticity Task
🔅Integrating LVM with Hadoop and
providing Elasticity to DataNode Storage
🔅Automating LVM Partition using Python-Script.
Solution of the task in steps:-
Step-1
Adding some new hard disk to the system . To view the detailed description we can use #fdisk -l command.
Step-2
creating physical partition from the new hard disk with commands and we can confirm the creation with the help of #pvdisplay [partition_name]
#pvcreate /dev/sda
#pvcreate /dev/sdb
Step -3
Creating volume group and confirming its creation with the help of following commands
#vgcreate [name_of_vg] /dev/sda /dev/sdb ->for creation
#vgdisplay [name_of_vg] ->for displaying info
Step -4
creating a new partition from the volume group use the following command
#lvcreate — size 3G — name mydatanode [name_of_vg] -> 3G is the size in gb
Before mounting our volume to the directory lets see the amount of storage shared to hadoop cluster:-
Step -5
mounting our newly created volume to the directory shared to the cluster
#mount /dev/hadoop/mydatanode /dn -> (/dev/hadoop/mydatanode )is full name of the volume
and also to confirm we can use #df -h command
Now again see the output on the cluster
Hurray! limiting the storage is successfully achieved.
Checking the elasticity
Now we will try to increase the size by 2 gb with the help of following command
#lvextend — size +2G /dev/hadoop/mydatanode
also reformatting the new storage is also necessary and can be achieved by #resize2fs /dev/hadoop/mydatanode
In the above command we have to first extend the size and also have to reformat the newly created partition. To achieve the same we can also use single command that is #lvresize — resizefs — size [desired_size] /dev/hadoop/mydatanode
this is the example in which the sharable storage is set to 7 gb initially and reducing it to 4 gb.
Hence integrating lvm with hadoop is successful and elasticity is also achieved.
Also this whole task can be automated with the following python script
Thanks for reading.