Integrating LVM with Hadoop

Akshit Modi
6 min readMar 14, 2021

For integrating LVM with Hadoop first lets talk about LVM.

What is LVM?

LVM, or Logical Volume Management, is a storage device management technology that gives users the power to pool and abstract the physical layout of component storage devices for easier and flexible administration.

Using LVM we can make our storage dynamic. Here we can combine multiple physical volume into one single Hard disk and operate this as a normal Hard disk.

Suppose you have 2 Hard disk of 50–50 GB. But you have one singe file which has data around 60 GB. So you cannot store this data in any of the Hard disk. Using LVM we can create a virtual Hard disk of 100 GB(or 80–90) combining this 2 Hard Disks. Now we can easily store data in this 100GB Hard disk. Some part of data stored in Physical Hard Disk 1 & some part in other Hard disk.

Here LVM will create a separate INODE table and maintain metadata of what data stored in which Hard disk.

I know it’s look like very harder but when we going to do for this, you can see it is very excited and simple like ABC.

Let’s back to our main concern

If you are new to Hadoop you can refer to my this blog where i have covered some basics of Hadoop.

We know that the Hadoop master node running the NameNode process takes care of all the filesystem namespace of HDFS, while the slave nodes provide actual storage to store the files and folders.

In this blog we are focusing on how to integrate LVM with Hadoop.

For the simplicity we are going to take some simple small setup.

Suppose that you have two hard disks inserted into your machine. The first hard disk is of size 20 GB and the other of size 40 GB. While configuring this machine as the slave system, you decide to donate space from the partition made on the second hard disk(40 GB). For the sake of simplicity, assume that we only have one partition on this hard disk, and it is almost the size of the hard disk, although it will be a little less given the space preserved for storing metadata. Also, assume that currently, the cluster consists of only one machine, i.e. a single-node cluster.

Here you are contributing 40GB to the Hadoop cluster but client wants to upload 43GB file. But as you know in this case upload will fail. You thought that we have second Hard disk which is not at all used of 20 GB. But how to use that as we have one single file of 43GB. Here we can use LVM.

So, let’s get started.

Here we have 2 Hard disk and we want to create one single logical hard disk.

Note: I am running this practical on RedHat Linux and already attached 2 hard disk to it.

Step 1: Create a physical volume

You can see that I have two devices here, /dev/sdb and /dev/sdc.

For demo purpose I have used both hard disk of 4 GB.

Let’s create a physical volume out of it.

[root@localhost ~]# pvcreate <hd_name>

Step 2: Create a volume group.

Volume group is basically combining 2 hard disk into one single storage. So at end of this we have one single hard disk of 8 GB.

[root@localhost ~]# vgcreate <vg_name> <hd1_name> <hd2_name>

Here we must have to use same hard disk which we use in above and created PV.

Using vgdisplay we can check all the vg we or system has created. For specific vg you can give vg name.

[root@localhost ~]# vgdisplay datastore

Step 3: Create logical volume.

In this step we are going to create a partition of hard disk.

[root@localhost ~]# lvcreate --size <size_of_partition> --name <name_of_partition> <vgname>

Step 4: Format partition.

As you all know we have to format a partition so we can use it. Here we are going to use ext4 format for formatting partition.

[root@localhost ~]# mkfs.ext4 /dev/<vganem>/<lvname>

Step 5: Mount to the folder.

As you know we have to mount it as we cannot directly interact with device.

This above step is basic LVM concept, but now we want to donate this storage to datanode, so we have to mount it accordingly.

I already created a directory /hadoop/hadoopdata/hdfs/datanode and then mounted the partition.

[root@localhost ~]# mount /dev/datastore/store1 /hadoop/hadoopdata/hdfs/datanode

let’s check its working or not.

Yeah!! we got out 6GB hard disk and we can donate it to the master node.

Summarize the concept using visual …

Let’s donate this storage in Hadoop cluster,

For this you have to add this mount point name in master node hdfs-site.xml file.

Let’s check hdfs report

[root@localhost ~]# hdfs dfsadmin -report

You can see we have have got around 6 GB in Hadoop cluster.

Well most important question is, why we are using LVM and follow this long approach?

The most obvious reason is that it lets you derive space from more than one physical hard disk. But, another significant advantage of using Logical Volumes is that they create dynamic partitions.

What does that mean?

Let’s say we want to increase our HD size from 6 to 7 GB.

In LVM we can extend the size of hard disk on the fly and it will not affect our already stored data.

When we extend the partition size, storage is derived from the volume group. Also, one can reduce logical volumes, in which case the extra storage is returned to the Volume Group.

Let’s go for it…

Step 1: Extend lvm.

[root@localhost ~]# lvextend --size +1G /dev/datasore/store1

Step 2: Reformat our partition.

It will not delete your older data available in the HD.

[root@localhost ~]# resize2fs /dev/datastore/store1

And now go for master node capacity…

So, now the HDFS slave, sharing logical volume is indeed a master of its own — it can decide when it wants to donate more or when to remove unused space from the LV.

Also, I wonder if you noticed that the root volume of our rhel system was also a Logical Volume. So basically we can use this same approach and extend our local virtual machine storage. It would have saved us a lot of time!

Anyway, I hope you enjoyed reading this blog!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response