Integrating LVM with Hadoop and
providing Elasticity to DataNode Storage

In this article, I will tell how to increase the capacity of the DataNode in Hadoop Cluster. It is sometimes required to increase the space provided by the DataNode rather than adding the new DataNode to the cluster because of the data or information we have stored.

What is LVM?

LVM is Logical Volume Management. This concept is use to expand the filesystem on the Fly. It provides the ability to add two or more different disks into one disk and to use that one disk as a whole. For Example : We have two storage media ( either pen-drive or hard-disk ) of 8GB and 16GB. We have a file whose size is 20GB which we can’t store independently on the seperate disks but with the help of LVM we can. LVM will create a logical volume known as Volume Group (VG) which have the size of 24GB.

How LVM do it?

The two seperate media is known as Physical Volume (P.V.). So we first create the PV Volume in the system as : “pvcreate <disk_name>”. After that we create a Volume Group as : “vgcreate <name> <disk_name>”. We can see also see whether the VG is created or not via “vgdisplay <name>”. After this we have created our 24GB storage. If we want to create a partition in the LVM then we use : “lvcreate — size +21G <path_to_created_lvm>”. We can also extend the partition in the VG via : “vgextend VG(to_be_exended) PV(to_be_attached)”

Let’s come to our topic of extending the DataNode.

First let’s check how much space is being provided by our DataNode to the NameNode. The space provided by the DataNode currently is of around 50GB.

I have added a new hard disk of 8GB in the DataNode.

Now create the Physical Volume of the 8GB disk have name /dev/sdb .

After creating the P.V., let’s see whether it is created or not using pvdisplay command.

Since I am doing the work on RHEL 8, during installation I have selected Dynamic Allocation of the size so I have one Logical Volume already created with VG name “rhel”.

Let’s add this P.V. to the VG Group rhel. We can do this via command “vgextend rhel /dev/sdb” and see the VG via command “vgdisplay”.

Now I am extending this LV and adding the 5GB from the 8GB hard disk. The command for that is : “lvextend — size +5G /dev/rhel/root”.

We can check whether the disk /dev/sdb is connected to the root or not via lsblk.

Doing this much is not enough because even though we have attached and increased the size of the LV but until and unless we reformat the disk this extra 5G is useless. To reformat the disk to provide total of 55GB of space we use the command : “fsadm resize /dev/rhel/root”.

Now that the disk is reformated and 5GB is added to the disk we can check the space provided via NameNode as : “hadoop dfsadmin -report”

Now the space provided via DataNode is increased from around 50GB to 55GB.

Thank You for reading.