The International Journal of Engineering And Science (IJES) ||Volume|| 1 ||Issue|| 1 ||Pages|| 13-17 ||2012|| ISSN: 2319 – 1813 ISBN: 2319 – 1805
Research of elastic management strategy for cloud storage 1
SHAO Bi-lin, 2BIAN Gen-qing, 3ZHU Xu-dong
1, First Author
School of Management, Xi’an University of Architecture and Technology, School of Information and Control Engineering, Xi’an University of Architecture and Technology, 3 Schools of Information and Control Engineering, Xi’an University of Architecture and Technology ,
*2, Corresponding Author
-------------------------------------------------------------Abstract------------------------------------------------------------In order to sol ve those issues including limited storage capacity, high cost of storage and fault recovery in traditional HDFS , cloud storage can effectively solve these problems by using virtual resources of the IaaS based on HDFS . But it cannot assure cloud storage to utilize virtual resources more effectively. In order to solve these problems, the paper proposes a elastic cloud storage framework based on HDFS , and introduces the thought of virtual resources management into the framework, and proposes a elastic management strategy of virtual resource allocation and scheduling based on this framework. The simulation experiment shows cloud storage can effectively improve the efficiency in the use of virtual resources.
Keywords: HDFS; Elastic Cloud Storage; Virtual Resource; Resource Allocation and Scheduling; feedback control theory.
------------------------------------------------------------------------------------------------------------------------Date of Submission: 19, October, 2012 Date of Publication: 10, November 2012 -----------------------------------------------------------------------------------------------------------------------I
Introduce
With the development of Web2.0 technology, especially the popularity of social network and popular, the traditional storage technologies (network Storage or distributed storage) are difficult to meet the demand of their vast amounts of unstructured data storage [1]. So more and more commercial websites began to use cloud storage architecture based on HDFS, such as Google, Amaze, Facebook, etc., the development is gathering mo mentu m. Design ideas of HDFS -based cloud storage derived fro m the virtual cluster computing, which is expanded fro m a single v irtual storage servers to a virtual storage server cluster (ie: Hadoop virtual cluster). Hadoop virtual cluster can be comb ined with mu ltiple virtual storage servers, just as a storage server which has a great computing power and high-throughput provides users with a transparent access interface [2]. However, the existing cloud storage has certain limitat ions in study and implementation, which concerned for high-throughput and high reliability excessively, and ignored the cloud storage’s actual use rate on virtual resource, thus it seriously affects the cloud storage application and popularization. Cloud storage for load-handling capability and disturbance switching capability is important, but www.thei jes.com
we should be even more concerned about the actual efficiency of cloud storage to the virtual resources. Therefore, through introducing the thought of virtual resource management, this article proposes a kind of elastic storage architecture and elastic management strategy of virtual resources. The simu lation results show that cloud storage can effectively improve the efficiency in the use of virtual resources. II HDFS on El astic Cloud Storage The paper makes the virtual resource scheduling units as Slot. Slot represents the characterizat ion of virtual resources calculation ability, wh ich is decided by the virtual resource CPU and memory size. Set CPU as the unit CPU capacity, taking 1GHz; mem on behalf of the unit memo ry capacity, taking 1GB; a virtual server Pi is able to provide a number of Slot
cpui memi Sloti min , cpu mem
The IJES
(1)
Page 13
Research of elastic management strategy for cloud storage Among them, says round down, cpui is the
dynamically start, stop, move a service instance, equivalent dynamic start, stop, migration of a
CPU size of virtual server Pi , memi is the memory
HDFS data node; secondly, through the virtual
size of virtual server Pi . In a mo ment t , HDFS
resource management platform based on the
loads is loadt , the nu mber of virtual server is N ,
virtual resource management capabilit ies for the
then available virtual resources:
HDFS data node provides a flexib le v irtual
N
Slottotal Sloti
(2)
of Hadoop virtual cluster preformed resource N
i 1
And HDFS virtual resource use rate:
u
loadt Slottotal
size, and allows mult iple HDFS shared resources;
loadt cpui
N
min cpu
i 1
resource allocation strategy, dynamic adjustment
total
at the same time, through the encapsulation of a
,
memi memtotal
virtual resource scheduling service, real-t ime monitoring of HDFS and virtual resource load
(3)
informat ion, and according to the load informat ion
If the HDFS v irtual resource usage in an expected
and the elastic scheduling strategy on data node,
high and low water level, it is judged as effective
HDFS elastically scheduling. Thus achieved
resource. The problem to be solved is dynamic
without affecting the existing HDFS structure and
programming R, such that when a certain amount
function ,to solve the resource waste and the
of loadt . 0 ulow u uhigh 1 . Therefore, to
problem of resource sharing, making use of many existing virtual resource management platform to
make the HDFS schedule resource elastically
achieve them is proposed, such as OpenNebula [3][4],
according to the real-time load state and resource
Platform[5]. in order to simulation test, virtual
use rate, it must be introduced the virtual resource
resource management conduct ISF of Platform
flexib ility
will be used.
management
strategy.
Flexible
management strategy should include: flexible
Application Program
Application Program
Application Program
HDFS Client
HDFS Client
HDFS Client
resource allocation strategy and flexib le resource scheduling strategy. But if the new resource
Internet/Intranet
management component, or resource scheduling
Elastic scheduling service resources
Hadoop Cluster Service(HDFS2)
Hadoop Cluster Service(HDFS1)
strategy is added to the existing HDFS, it will not Vitual Resource Management Platform(VM Orchestra)
only increase the complexity of HDFS, but also increase the HDFS manager addit ional cost, and
Service instance
Service instance
Service instance
Service instance
Service instance
Service instance
Service instance
Service instance
Service instance
Service instance
increase the load, thus affect the overall HDFS
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
Hadoop-based elastic cloud storage
load capacity and stability.
Hypervisor(XEN)
This paper introduces the virtual resource
Hypervisor(XEN)
...
Amazon EC2
Physical resource pool
management platform, and puts forward Elastic cloud storage system based on HDFS. First of all,
Fig 1. Elastic Cloud Storage based on HDFS
the HDFS cluster is encapsulated into a virtual
Shown in Figure 1, a elastic cloud storage
resource management platform as a service, each
based on HDFS is made up of four parts : a virtual
service instance represents a HDFS data node. The
resource, virtual resource management platform
use of virtual resource management plat form for
(VM Orchestra), Hadoop virtual cluster service,
service instance lifetime management to realize
Hadoop virtual cluster service flexib le scheduling.
virtual resource management platfo rm to the
Virtual resource is the carrier of Each HDFS
HDFS data node lifetime management, i.e., by
data nodes, managed by the virtual resource
www.thei jes.com
The IJES
Page 14
Research of elastic management strategy for cloud storage management
platform.
Virtual
resource
Elastic
cloud
storage
virtual
resource
management platform can be deployed to run a
allocation strategy, consists of reservation strategy,
range of services, the p latform can assign a virtual
sharing strategy, borrowing and lending strategy,
resource and start one or more service instance for
recycling strategies. Reservation and sharing
each service configuration.
strategy provides a group of static exclusive and
Hadoop virtual cluster service, namely HVCS.
shared resources belonging to HCS. Reservation
Each HVCS service represents a particular HDFS
policy allows directly to HCS allocation reserved
Hadoop cluster, consisting of a configuration and
resources, regardless of the HCS can be used or
one or more instances. In this paper, each instance
not be used, even if HCS load is small; and
of HVCS represents a HDFS data node, the
sharing strategy provides shared resources for
configuration of HVCS specifies the required
HCS, including 2 parts: shared resource belonging
informat ion when HDFS runs, and a group of
to HCS and overload using shared resource.
flexib le resource allocation strategy, namely
Shared resource does not immed iately distribute
HDFS flexib le resource allocation strategy of data
HCS resource, HCS can use resource according to
node.
the configuration of the sharing proportion only
Hadoop virtual cluster elastic scheduling
HCS has more demand, wh ich requires more
service, namely HVRS. A data node is increased,
virtual nodes, and reserve resources has been
removed or transferred fro m the HDFS cluster by
exhausted; when reserved resources and sharing
HVRS through the current HDFS resources
resources still cannot meet the demand of HCS
informat ion and the load informat ion, HDFS data
virtual node, sharing strategy allows HCS to use
node lifet ime control and service scheduling
other HCS free sharing resources. Borrowing and
interface and the provisions of flexibility principle
lending strategy are used for HCS to reserve
to perform HDFS data node elastic scheduling.
resources sharing, this strategy will take effect
HVCS resource usage was obtained by the
only when shared resource has overload. By
interface wh ich is provided by the virtual resource
borrowing, lending the idle reserved resource,
management platform, and the HDFS load
HCS can effectively use the free reserved
informat ion is provided by HDFS.
resources in the Hadoop cluster, which effectively improves the environment resources use rate.
III
Virtual Resource Elastic Management Strategy of Elastic Cloud Storage
Recycling strategy used to recycle HCS idle
Virtual resource elastic management strategy
out, support HCS in its spare time, and because of
of elastic cloud storage includes 2 parts: flexible
increasing load need more virtual node, the others
resource allocation strategy and flexib le resource
can use the resources which belongs to HCS. The
scheduling strategy. Flexib le resource allocation
research results found that, after introducing the
strategy, used to specify a set of the v irtual
virtual resource allocation strategy, compared with
resources for HDFS to ensure the elastic
the conventional HDFS fixed using a group of the
scheduling object available; the elastic scheduling
allocation of resources, cloud storage on HDFS
strategy, elastically schedules HDFS data node
available
through the usage of virtual resource, so as to
according to load state.
resources which is required sharing or borrowing
effectively use the virtual resource.
resources
are
dynamic
regulated
3.2 Virtual Resource El astic Scheduling
IV Flexi ble Allocati on Strateg y on Virtual
Strategy
Resource www.thei jes.com
The IJES
Page 15
Research of elastic management strategy for cloud storage For every HDFS, the data node elastic
Slottotal , i.e.:
scheduling strategy is composed of a three tuple structure P(S, A, R), wherein S is a sample space (Samp ling),
A
for
the
action
space
of
L(t ) s1L(t 1) r1Slottotal (t 1) r2 Slottotal (t 2)
implementation (Actions), R (Rule) represents
(5)
scheduling rules.
The model parameters
s1 , r1 and r2 , which
S represents sampling space, wh ich consists of the virtual node load information {u(l (hi ))}iN1 ,
can use regression minimu m variance (Recursive
wherein i represents a virtual node, hi represents
Least-Squares, RLS) method assessment. In order
the virtual node load capacity, l (hi ) represents a virtual node load, u (l (hi )) represents virtual node
to realize the flexib le control based on this model,
resource consumption. Then
equation (6):
S {S (hi ) |{u(l (hi )) |1 i N}}
(4)
A said elastic scheduling execution action set, A ( Aover , Awasted ) , wherein
the equation (5) is transformed the control
Slottotal (t )
Aover represents
for resource use overload handling Awasted represents waste treatment action.
(6)
action,
By (3) type R knowledge, scheduling rules are
Where in Lref L(t 1) , namely : the next time step expected HDFS load information.
defined as follo ws:
The scheduling rule R is introduced to the
* if the current HDFS in the resources
u h igher than uhigh , judged as
utilizat ion rate
resource usage overload, HCRS will perform the action Aover on the HDFS data node overload operation, namely: if u uhigh , then Aover . * if the current HDFS resource using rate is lower than
s r 1 Lref 1 L(t ) 2 Slottotal (t 1) r1 r1 r1
control equation (6), wh ich can get elastic control equation (7): s1 r2 1 r Lhigh r L(t ) r Slottotal (t 1) if Lhigh L(t ) 1 1 1 s1 r2 1 Slottotal (t ) Llow L(t ) Slottotal (t 1) if Llow L(t ) r1 r1 r1 Slottotal (t 1) otherwise
u Namely: the HSC can work only when the
ulow , it is judged to exist HDFS
waste of resources, HCRS will perform Awasted ,
Lhigh L(t )
(h igher
than
supply)
or
HDFS node waste processing, namely : if u ulow ,
Llow L(t ) (less supply), and set new Slottotal (t ) ,
then Awasted .
namely the executive Aover or Awasted .
The elastic scheduling is the process of tracking HDFS load in formation {l (hi )}iN1 , based
V
Simulati on test
Virtual resource
elastic
management
on the scheduling rules R, it schedules and controls timely HDFS's Slottotal , and realization
strategies are tested, specific methods are as
of virtual resource is dispatching. In order to
using virtual resource management product ISF of
realize virtual resource automated flexible
Platform , including 8 adjustable virtual resource
scheduling, introduced into the classical feedback
Slots; and HDFS1 cluster contains 2 slots reserved
[6][7]
control theory , in modeling HDFS, the relations of load informat ion L {l (hi )}iN1 and www.thei jes.com
follows: v irtual resource management platform:
resources and 1 slots of shared resources; at the same time HDFS2 cluster contains 1 slots resource The IJES
Page 16
Research of elastic management strategy for cloud storage reservation and 1 slots of shared resources; load
difficult ies to share resources among mult iple
sampling
HDFS, and the problems and difficulties are
through
the
use
of
CPU
rate
measurement, high water level uhigh and low
caused
by
the
lack
of
flexib le
resource
management pattern. Against this problem, this
water level ulow is set to 85%, 40%; Aover to
paper proposes elastic cloud storage framewo rk
increase a slots, Awasted to remove a slots; HDFS
based on the HDFS, and puts forward HDFS data
testing tools as HDFS Bench. Testing server test
node flexible resource allocation strategy and
tool over a period of time, respectively for 2
elastic scheduling strategy according to its
HDFS cluster time tested, through the observation
characteristic. The simulat ion experiment shows
of 2 virtual cluster with load change for virtual
that elastic management strategy can improve
resources (slots) usage. The test results as shown
HDFS existing problems, and effectively imp rove
in figure 2.
the efficiency in the use of virtual resource. Acknowledg ment This research is supported by National Natural Science Foundation of China (61073196&61272458) and Natural Science Research Foundation of Shanxi Province. China (2011JM 8026).
Reference [1]
M. Armbrust, A. Fox, D. A. Patterson, N. Lanham, B. Trushkowsky,J.Trutna,
Fig 2. Simu lation Result
Scale-independent
The results show that, when HDFS1 loading amount is small, it uses only one slots, while the
and
storage
H. for
Oh. social
Scads: computing
applications. In Proc. of CIDR, 2009. [2]
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google
remainder of the resource is the part to high load
file system. In Proc. of SOSP, 2003.Volume 37 Issue 5,
HDFS2 ;with the increasing of HDFS1 loading
December 2003, Pages: 29-43.
amount, shared slots used by HDFS2 and
[3]
Open Nebula. http://opennebula.org.
borrowing slots will gradually returned to the
[4]
B. Sotomayor, R. Montero, I. Llorente, and I. Foster.
HDFS1, then their maximu m load using slots;
Capacity Leasing in Cloud Systems using the Open
With the HDFS2 loading amount gradually
Nebula Engine. In Workshop on Cloud Computing and its
decreased, HDFS2 gradually reduced the use of
Applications (CCA08)ďźŒ2009.
slots, thus released slots are gradually used by the
[5]
Platform: http://www.platform.com/
full load of HDFS1; through the use of slots
[6]
X. Zhu, M. Uysal, Z. Wang, S. Singhal, A. Merchant, P.
sharing and borrowing fro m HDFS2, wh ich
Padala, and K. Shin. What does control theory bring to
gradually reduces load. Along with its load
systems research? SIGOPS Operating Systems Review,
reduction, HDFS2 gradually reduces the use of
2009, 43(1):62-69.
slots, and tends to smooth. VI
[7]
Conclusion
Automated control in cloud computing: Challenges and
Through studying of the existing HDFS constructing
methods
and
the
H. C. Lim,S. Babu, J. S. Chase, and S. S. Parekh.
data
opportunities. ACDC '09 Proceedings of the 1st workshop
node
on Automated control for dat acenters and clouds. Volume:
controlling mode, what makes the utilizat ion rate
C, Issue: 09, Publisher: ACM, Pages: 13-18.
of HDFS for the system res ource low is the problems that the waste of resource usage and the www.thei jes.com
The IJES
Page 17