Xen Memory De-duplication

Discussion:

Aditya Gadre

2010-10-09 17:56:23 UTC

Aim is to implement Xen Memory Deduplication with minimum overhead.

Our approach to de-duplication is as follows

In most cases, Domain-U uses a small set of well-known operating systems
such as Linux, FreeBSD and Microsoft Windows. In such environment many
domains share read-only filesystems that contain operating system and
frequently usedprogram files and libraries.Each domain has their own
writable filesystems for storing data and temporary files. In this
configuration, multiple pages scattered in different domains mostly happen
to contain same disk block. So, in our approach to perform deduplication we
intend to add a data structure in dom 0 which store disk block number and
the machine frame number(MFN) when a read request for the read only code(and
data) is made. Now when another domain U places the request for the block of
code and Dom 0 recieves a request for I/O (DMA), it will first check into
the data structure for the entry for the block. If it finds the block it
will return the MFN of the already read page and map it to the requesting
domain's PFN resulting in zero I/O processing time of blocks which are
already read. This in turn results in de-duplication of the read only pages
accessed by multiple domains without any overhead of hashing the page.

Test case scenario:

Consider a Dom0 linux kernel using a filesystem with deduplication enabled.
Then we install a DomU kernel with the virtual disk as a image file on the
disk(.img). Then we make multiple copies of the image to deploy multiple
DomUs running same kernel. Now, as deduplication is enabled in the file
system initially all the blocks of the domains will be pointing to the same
disk blocks. Now when the kernel's are booted, they all will consume memory
only once for the programs(code segment) loaded in the memory. Now as these
OSs start to write to their own virtual filesystems the blocks of the image
will be COW'ed by the filesystem resulting in different block number.
Is such a approach implemented? We intend to implement this as a project.
What are the suspected challanges?

Regards,
Aditya Gadre

Pasi Kärkkäinen

2010-10-09 19:09:20 UTC

Permalink

Post by Aditya Gadre
Aim is to implement Xen Memory Deduplication with minimum overhead.
Our approach to de-duplication is as follows
In most cases, Domain-U uses a small set of well-known operating systems
such as Linux, FreeBSD and Microsoft Windows. In such environment many
domains share read-only filesystems that contain operating system and
frequently usedprogram files and libraries.Each domain has their own
writable filesystems for storing data and temporary files. In this
configuration, multiple pages scattered in different domains mostly happen
to contain same disk block. So, in our approach to perform deduplication
we intend to add a data structure in dom 0 which store disk block number
and the machine frame number(MFN) when a read request for the read only
code(and data) is made. Now when another domain U places the request for
the block of code and Dom 0 recieves a request for I/O (DMA), it will
first check into the data structure for the entry for the block. If it
finds the block it will return the MFN of the already read page and map it
to the requesting domain's PFN resulting in zero I/O processing time of
blocks which are already read. This in turn results in de-duplication of
the read only pages accessed by multiple domains without any overhead of
hashing the page.
Consider a Dom0 linux kernel using a filesystem with deduplication
enabled. Then we install a DomU kernel with the virtual disk as a image
file on the disk(.img). Then we make multiple copies of the image to
deploy multiple DomUs running same kernel. Now, as deduplication is
enabled in the file system initially all the blocks of the domains will be
pointing to the same disk blocks. Now when the kernel's are booted, they
all will consume memory only once for the programs(code segment) loaded in
the memory. Now as these OSs start to write to their own virtual
filesystems the blocks of the image will be COW'ed by the filesystem
resulting in different block number.
Is such a approach implemented? We intend to implement this as a project.
What are the suspected challanges?

Yeah, I think the image COW is possible using the Xen blktap2 vhd support,
and also maybe Xen qcow* stuff.

Also check Xen4.0 wiki page for more info about the memory sharing etc:
http://wiki.xensource.com/xenwiki/Xen4.0

-- Pasi

Dan Magenheimer

2010-10-09 23:40:30 UTC

Permalink

I'm not an expert on it but I believe this sounds very similar to the page sharing implementation that already exists in Xen 4.0. The implementation in Xen only works on HVM guests and only on machines that have EPT though. The patches (which were accepted into Xen) were posted here:

http://lists.xensource.com/archives/html/xen-devel/2009-12/msg00797.html

From: Aditya Gadre [mailto:***@gmail.com]
Sent: Saturday, October 09, 2010 11:56 AM
To: Xen-***@lists.xensource.com
Subject: [Xen-devel] Xen Memory De-duplication

Aim is to implement Xen Memory Deduplication with minimum overhead.

Our approach to de-duplication is as follows

In most cases, Domain-U uses a small set of well-known operating systems such as Linux, FreeBSD and Microsoft Windows. In such environment many domains share read-only filesystems that contain operating system and frequently usedprogram files and libraries.Each domain has their own writable filesystems for storing data and temporary files. In this configuration, multiple pages scattered in different domains mostly happen to contain same disk block. So, in our approach to perform deduplication we intend to add a data structure in dom 0 which store disk block number and the machine frame number(MFN) when a read request for the read only code(and data) is made. Now when another domain U places the request for the block of code and Dom 0 recieves a request for I/O (DMA), it will first check into the data structure for the entry for the block. If it finds the block it will return the MFN of the already read page and map it to the requesting domain's PFN resulting in zero I/O processing time of blocks which are already read. This in turn results in de-duplication of the read only pages accessed by multiple domains without any overhead of hashing the page.

Test case scenario:

Consider a Dom0 linux kernel using a filesystem with deduplication enabled. Then we install a DomU kernel with the virtual disk as a image file on the disk(.img). Then we make multiple copies of the image to deploy multiple DomUs running same kernel. Now, as deduplication is enabled in the file system initially all the blocks of the domains will be pointing to the same disk blocks. Now when the kernel's are booted, they all will consume memory only once for the programs(code segment) loaded in the memory. Now as these OSs start to write to their own virtual filesystems the blocks of the image will be COW'ed by the filesystem resulting in different block number.
Is such a approach implemented? We intend to implement this as a project. What are the suspected challanges?

Regards,
Aditya Gadre

Aditya Gadre

2010-10-10 05:24:58 UTC

Permalink

This kind of implementation will require the disk blocks from different
DomUs to be mapped to same physical disk block.
For example,
1) Shared read only filesystem
2) Union based filesystem
3) Virtual machine images deployed on a host filesystem which has
deduplication enabled

What kind of arrangement of filesystem is used in production environments
for DomUs which host large number of VMs as in cloud enviorment?

Im not an expert on it but I believe this sounds very similar to the
page sharing implementation that already exists in Xen 4.0. The
implementation in Xen only works on HVM guests and only on machines that
have EPT though. The patches (which were accepted into Xen) were posted
http://lists.xensource.com/archives/html/xen-devel/2009-12/msg00797.html
*Sent:* Saturday, October 09, 2010 11:56 AM
*Subject:* [Xen-devel] Xen Memory De-duplication
Aim is to implement Xen Memory Deduplication with minimum overhead.
Our approach to de-duplication is as follows
In most cases, Domain-U uses a small set of well-known operating systems
such as Linux, FreeBSD and Microsoft Windows. In such environment many
domains share read-only filesystems that contain operating system and
frequently usedprogram files and libraries.Each domain has their own
writable filesystems for storing data and temporary files. In this
configuration, multiple pages scattered in different domains mostly happen
to contain same disk block. So, in our approach to perform deduplication we
intend to add a data structure in dom 0 which store disk block number and
the machine frame number(MFN) when a read request for the read only code(and
data) is made. Now when another domain U places the request for the block of
code and Dom 0 recieves a request for I/O (DMA), it will first check into
the data structure for the entry for the block. If it finds the block it
will return the MFN of the already read page and map it to the requesting
domain's PFN resulting in zero I/O processing time of blocks which are
already read. This in turn results in de-duplication of the read only pages
accessed by multiple domains without any overhead of hashing the page.
Consider a Dom0 linux kernel using a filesystem with deduplication enabled.
Then we install a DomU kernel with the virtual disk as a image file on the
disk(.img). Then we make multiple copies of the image to deploy multiple
DomUs running same kernel. Now, as deduplication is enabled in the file
system initially all the blocks of the domains will be pointing to the same
disk blocks. Now when the kernel's are booted, they all will consume memory
only once for the programs(code segment) loaded in the memory. Now as these
OSs start to write to their own virtual filesystems the blocks of the image
will be COW'ed by the filesystem resulting in different block number.
Is such a approach implemented? We intend to implement this as a project.
What are the suspected challanges?
Regards,
Aditya Gadre

Pasi Kärkkäinen

2010-10-10 12:34:08 UTC

Permalink

I guess Xen blktap qcow* images should do? And maybe blktap2 VHD?

-- Pasi

Post by Aditya Gadre
What kind of arrangement of filesystem is used in production environments
for DomUs which host large number of VMs as in cloud enviorment?
On Sun, Oct 10, 2010 at 5:10 AM, Dan Magenheimer
I*m not an expert on it but I believe this sounds very similar to the
page sharing implementation that already exists in Xen 4.0. The
implementation in Xen only works on HVM guests and only on machines that
have EPT though. The patches (which were accepted into Xen) were posted
[2]http://lists.xensource.com/archives/html/xen-devel/2009-12/msg00797.html
Sent: Saturday, October 09, 2010 11:56 AM
Subject: [Xen-devel] Xen Memory De-duplication
Aim is to implement Xen Memory Deduplication with minimum overhead.
Our approach to de-duplication is as follows
In most cases, Domain-U uses a small set of well-known operating systems
such as Linux, FreeBSD and Microsoft Windows. In such environment many
domains share read-only filesystems that contain operating system and
frequently usedprogram files and libraries.Each domain has their own
writable filesystems for storing data and temporary files. In this
configuration, multiple pages scattered in different domains mostly
happen to contain same disk block. So, in our approach to perform
deduplication we intend to add a data structure in dom 0 which store
disk block number and the machine frame number(MFN) when a read request
for the read only code(and data) is made. Now when another domain U
places the request for the block of code and Dom 0 recieves a request
for I/O (DMA), it will first check into the data structure for the entry
for the block. If it finds the block it will return the MFN of the
already read page and map it to the requesting domain's PFN resulting in
zero I/O processing time of blocks which are already read. This in turn
results in de-duplication of the read only pages accessed by multiple
domains without any overhead of hashing the page.
Consider a Dom0 linux kernel using a filesystem with deduplication
enabled. Then we install a DomU kernel with the virtual disk as a image
file on the disk(.img). Then we make multiple copies of the image to
deploy multiple DomUs running same kernel. Now, as deduplication is
enabled in the file system initially all the blocks of the domains will
be pointing to the same disk blocks. Now when the kernel's are booted,
they all will consume memory only once for the programs(code segment)
loaded in the memory. Now as these OSs start to write to their own
virtual filesystems the blocks of the image will be COW'ed by the
filesystem resulting in different block number.
Is such a approach implemented? We intend to implement this as a
project. What are the suspected challanges?
Regards,
Aditya Gadre
References
Visible links
2. http://lists.xensource.com/archives/html/xen-devel/2009-12/msg00797.html

Shriram Rajagopalan

2010-10-11 07:58:47 UTC

Permalink

Not sure about the DMA part, but I suggest you also take a look at satori
project code (memshr modules) in xen.
http://www.usenix.org/events/usenix09/tech/slides/milos.pdf

Post by Aditya Gadre

Post by Aditya Gadre
This kind of implementation will require the disk blocks from

different

Post by Aditya Gadre
DomUs to be mapped to same physical disk block.
For example,
1) Shared read only filesystem
2) Union based filesystem
3) Virtual machine images deployed on a host filesystem which has
deduplication enabled

I guess Xen blktap qcow* images should do? And maybe blktap2 VHD?
-- Pasi

Post by Aditya Gadre
What kind of arrangement of filesystem is used in production

environments

Post by Aditya Gadre
for DomUs which host large number of VMs as in cloud enviorment?
On Sun, Oct 10, 2010 at 5:10 AM, Dan Magenheimer
I*m not an expert on it but I believe this sounds very similar to

the

Post by Aditya Gadre
page sharing implementation that already exists in Xen 4.0. The
implementation in Xen only works on HVM guests and only on machines

that

Post by Aditya Gadre
have EPT though. The patches (which were accepted into Xen) were

posted

Post by Aditya Gadre
[2]

http://lists.xensource.com/archives/html/xen-devel/2009-12/msg00797.html

Post by Aditya Gadre
Sent: Saturday, October 09, 2010 11:56 AM
Subject: [Xen-devel] Xen Memory De-duplication
Aim is to implement Xen Memory Deduplication with minimum overhead.
Our approach to de-duplication is as follows
In most cases, Domain-U uses a small set of well-known operating

systems

Post by Aditya Gadre
such as Linux, FreeBSD and Microsoft Windows. In such environment

many

Post by Aditya Gadre
domains share read-only filesystems that contain operating system

and

Post by Aditya Gadre
frequently usedprogram files and libraries.Each domain has their own
writable filesystems for storing data and temporary files. In this
configuration, multiple pages scattered in different domains mostly
happen to contain same disk block. So, in our approach to perform
deduplication we intend to add a data structure in dom 0 which store
disk block number and the machine frame number(MFN) when a read

request

Post by Aditya Gadre
for the read only code(and data) is made. Now when another domain U
places the request for the block of code and Dom 0 recieves a

request

Post by Aditya Gadre
for I/O (DMA), it will first check into the data structure for the

entry

Post by Aditya Gadre
for the block. If it finds the block it will return the MFN of the
already read page and map it to the requesting domain's PFN

resulting in

Post by Aditya Gadre
zero I/O processing time of blocks which are already read. This in

turn

Post by Aditya Gadre
results in de-duplication of the read only pages accessed by

multiple

Post by Aditya Gadre
domains without any overhead of hashing the page.
Consider a Dom0 linux kernel using a filesystem with deduplication
enabled. Then we install a DomU kernel with the virtual disk as a

image

Post by Aditya Gadre
file on the disk(.img). Then we make multiple copies of the image to
deploy multiple DomUs running same kernel. Now, as deduplication is
enabled in the file system initially all the blocks of the domains

will

Post by Aditya Gadre
be pointing to the same disk blocks. Now when the kernel's are

booted,

Post by Aditya Gadre
they all will consume memory only once for the programs(code

segment)

Post by Aditya Gadre
loaded in the memory. Now as these OSs start to write to their own
virtual filesystems the blocks of the image will be COW'ed by the
filesystem resulting in different block number.
Is such a approach implemented? We intend to implement this as a
project. What are the suspected challanges?
Regards,
Aditya Gadre
References
Visible links
2.

http://lists.xensource.com/archives/html/xen-devel/2009-12/msg00797.html
_______________________________________________
Xen-devel mailing list
http://lists.xensource.com/xen-devel

--
perception is but an offspring of its own self

Thomas Goirand

2010-10-12 10:20:08 UTC

Permalink

Post by Aditya Gadre
This kind of implementation will require the disk blocks from
different DomUs to be mapped to same physical disk block.
For example,
1) Shared read only filesystem
2) Union based filesystem
3) Virtual machine images deployed on a host filesystem which has
deduplication enabled
What kind of arrangement of filesystem is used in production
environments for DomUs which host large number of VMs as in cloud
enviorment?

I don't know for others, but for us (eg: at GPLHost), none of
what you described above is doable. Each VM has its own
LVM partition, and we wont have shared filesystem among
many VMs. Never ever. We don't use virtual machine *images*
either.

What would be nicer, would be a more general approach, and
maybe have the possibility to use a filesystem that is already
mounted on the dom0. Why? Because most of the time, what
is wasted, is the free space in each LVM, in what I described
above.

Thomas

Tim Deegan

2010-10-12 10:33:26 UTC

Permalink

Post by Thomas Goirand
What would be nicer, would be a more general approach, and
maybe have the possibility to use a filesystem that is already
mounted on the dom0.

Do you want something more than NFS/CIFS mounts already offer?

Tim.

--
Tim Deegan <***@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)

Tim Deegan

2010-10-11 12:59:46 UTC

Permalink

Post by Aditya Gadre
Is such a approach implemented? We intend to implement this as a
project. What are the suspected challanges?

Yes, this was implemented last year; the patches are in the xen-unstable
tree. They hook the read path in blocktap to detect duplicate reads of
the same block and turn them into copy-on-write mappings in the
hypervisor.

Cheers,

Tim.

--
Tim Deegan <***@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)

Aditya Gadre

2010-10-08 19:01:35 UTC

Permalink

Aim is to implement Xen Memory Deduplication with minimum overhead.

Our approach to de-duplication is as follows?

In most cases, Domain-U uses a small set of well-known operating systems
such as Linux, FreeBSD and Microsoft Windows. In such environment many
domains share read-only filesystems that contain operating system and
frequently usedprogram files and libraries.Each domain has their own
writable filesystems for storing data and temporary files. In this
configuration, multiple pages scattered in different domains mostly happen
to contain same disk block. So, in our approach to perform deduplication we
intend to add a data structure in dom 0 which store disk block number and
the machine frame number(MFN) when a read request for the read only code(and
data) is made. Now when another domain U places the request for the block of
code and Dom 0 recieves a request for I/O (DMA), it will first check into
the data structure for the entry for the block. If it finds the block it
will return the MFN of the already read page and map it to the requesting
domain's PFN resulting in zero I/O processing time of blocks which are
already read. This in turn results in de-duplication of the read only pages
accessed by multiple domains without any overhead of hashing the page.

Test case scenario:

Consider a Dom0 linux kernel using a filesystem with deduplication enabled.
Then we install a DomU kernel with the virtual disk as a image file on the
disk(.img). Then we make multiple copies of the image to deploy multiple
DomUs running same kernel. Now, as deduplication is enabled in the file
system initially all the blocks of the domains will be pointing to the same
disk blocks. Now when the kernel's are booted, they all will consume memory
only once for the programs(code segment) loaded in the memory. Now as these
OSs start to write to their own virtual filesystems the blocks of the image
will be COW'ed by the filesystem resulting in different block number.
Is such a approach implemented? We intend to implement this project. What
are the suspected challanges?

Aditya Gadre

2010-10-08 19:02:56 UTC

Permalink