Aditya Gadre
2010-10-09 17:56:23 UTC
Aim is to implement Xen Memory Deduplication with minimum overhead.
Our approach to de-duplication is as follows
In most cases, Domain-U uses a small set of well-known operating systems
such as Linux, FreeBSD and Microsoft Windows. In such environment many
domains share read-only filesystems that contain operating system and
frequently usedprogram files and libraries.Each domain has their own
writable filesystems for storing data and temporary files. In this
configuration, multiple pages scattered in different domains mostly happen
to contain same disk block. So, in our approach to perform deduplication we
intend to add a data structure in dom 0 which store disk block number and
the machine frame number(MFN) when a read request for the read only code(and
data) is made. Now when another domain U places the request for the block of
code and Dom 0 recieves a request for I/O (DMA), it will first check into
the data structure for the entry for the block. If it finds the block it
will return the MFN of the already read page and map it to the requesting
domain's PFN resulting in zero I/O processing time of blocks which are
already read. This in turn results in de-duplication of the read only pages
accessed by multiple domains without any overhead of hashing the page.
Test case scenario:
Consider a Dom0 linux kernel using a filesystem with deduplication enabled.
Then we install a DomU kernel with the virtual disk as a image file on the
disk(.img). Then we make multiple copies of the image to deploy multiple
DomUs running same kernel. Now, as deduplication is enabled in the file
system initially all the blocks of the domains will be pointing to the same
disk blocks. Now when the kernel's are booted, they all will consume memory
only once for the programs(code segment) loaded in the memory. Now as these
OSs start to write to their own virtual filesystems the blocks of the image
will be COW'ed by the filesystem resulting in different block number.
Is such a approach implemented? We intend to implement this as a project.
What are the suspected challanges?
Regards,
Aditya Gadre
Our approach to de-duplication is as follows
In most cases, Domain-U uses a small set of well-known operating systems
such as Linux, FreeBSD and Microsoft Windows. In such environment many
domains share read-only filesystems that contain operating system and
frequently usedprogram files and libraries.Each domain has their own
writable filesystems for storing data and temporary files. In this
configuration, multiple pages scattered in different domains mostly happen
to contain same disk block. So, in our approach to perform deduplication we
intend to add a data structure in dom 0 which store disk block number and
the machine frame number(MFN) when a read request for the read only code(and
data) is made. Now when another domain U places the request for the block of
code and Dom 0 recieves a request for I/O (DMA), it will first check into
the data structure for the entry for the block. If it finds the block it
will return the MFN of the already read page and map it to the requesting
domain's PFN resulting in zero I/O processing time of blocks which are
already read. This in turn results in de-duplication of the read only pages
accessed by multiple domains without any overhead of hashing the page.
Test case scenario:
Consider a Dom0 linux kernel using a filesystem with deduplication enabled.
Then we install a DomU kernel with the virtual disk as a image file on the
disk(.img). Then we make multiple copies of the image to deploy multiple
DomUs running same kernel. Now, as deduplication is enabled in the file
system initially all the blocks of the domains will be pointing to the same
disk blocks. Now when the kernel's are booted, they all will consume memory
only once for the programs(code segment) loaded in the memory. Now as these
OSs start to write to their own virtual filesystems the blocks of the image
will be COW'ed by the filesystem resulting in different block number.
Is such a approach implemented? We intend to implement this as a project.
What are the suspected challanges?
Regards,
Aditya Gadre