Can we integrate LVM with Hadoop??(making our cluster elastic)

Bhavesh Kumawat
4 min readDec 20, 2020

--

It is an interesting concept to make storage contributed by the different data nodes be customizable and elastic . That means now it is dependent on the user of the data node to allow how much data will be contributed to the cluster and he/she will be able to increase or decrease the size on the go.

Challenges faced in the contemporary way:-

As we know whenever we share any directory with the cluster it shares whole of the storage which is free in the system. Now consider the data node be a individual system which has contributed its own storage to the cluster and has to do its individual task on daily basis. On one day there is high load on the cluster and user also has to perform the task on the data having high storage but he will not be able to do it . So there is need of limiting the storage that is shared to the cluster. So there is interesting task come in play.

Task Description

🌀 Elasticity Task
🔅Integrating LVM with Hadoop and
providing Elasticity to DataNode Storage
🔅Automating LVM Partition using Python-Script.

Solution of the task in steps:-

Step-1

Adding some new hard disk to the system . To view the detailed description we can use #fdisk -l command.

details of the new hard disk of size 5gb each

Step-2

creating physical partition from the new hard disk with commands and we can confirm the creation with the help of #pvdisplay [partition_name]

#pvcreate /dev/sda

#pvcreate /dev/sdb

Step -3

Creating volume group and confirming its creation with the help of following commands

#vgcreate [name_of_vg] /dev/sda /dev/sdb ->for creation

#vgdisplay [name_of_vg] ->for displaying info

information of the volume group

Step -4

creating a new partition from the volume group use the following command

#lvcreate — size 3G — name mydatanode [name_of_vg] -> 3G is the size in gb

Before mounting our volume to the directory lets see the amount of storage shared to hadoop cluster:-

Step -5

mounting our newly created volume to the directory shared to the cluster

#mount /dev/hadoop/mydatanode /dn -> (/dev/hadoop/mydatanode )is full name of the volume

and also to confirm we can use #df -h command

image showing output before and after mounting the logical volume

Now again see the output on the cluster

output after mounting volume

Hurray! limiting the storage is successfully achieved.

Checking the elasticity

Now we will try to increase the size by 2 gb with the help of following command

#lvextend — size +2G /dev/hadoop/mydatanode

also reformatting the new storage is also necessary and can be achieved by #resize2fs /dev/hadoop/mydatanode

execution of the above command
result in the web ui of the cluster

In the above command we have to first extend the size and also have to reformat the newly created partition. To achieve the same we can also use single command that is #lvresize — resizefs — size [desired_size] /dev/hadoop/mydatanode

this is the example in which the sharable storage is set to 7 gb initially and reducing it to 4 gb.

initially the storage shared
execution of code
result after the successful execution

Hence integrating lvm with hadoop is successful and elasticity is also achieved.

Also this whole task can be automated with the following python script

Thanks for reading.

--

--

Bhavesh Kumawat
Bhavesh Kumawat

Written by Bhavesh Kumawat

learning more and sharing more

No responses yet