Kareim Sobhe et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 4, Issue No. 1, 069 - 080
Management of Multi-Channel Multi-Container Application Servers The American University in Cairo P.O.Box 2511, Cairo, Egypt asameh@cis.psu.edu.sa
Prince Sultan University P.O.Box 66833, Riyadh, Saudi Arabia
Sameh.aucegypt.edu@gmail.com I- INTRODUCTION The architecture of the proposed web application server [1] is made up of two main components which are the Container, and the High Performance Agent (HPA). The container is a normal application deployment server which can load application component instances in the form of services, as well as provide the resources required for them to execute and function. The Container supports a UDP based communication layer through which all communication between any Container and its clients are over a state-full communication protocol built on top of UDP. An important question pops up which is “Why� build a state-full communication protocol over UDP while the TCP protocol exists? The answer can be summarized in three points: 1. The TCP protocol is too general with lots of overhead to accommodate its general features designed to serve any kind of communication sequences between any two entities. A minimized version of TCP can be implemented to remove all the overhead and be specific for web transactions on a Request/Reply basis only, 2. As the tests presented below show, over UDP more web transactions can be handled than over the normal TCP used in current web servers; thus utilizing concurrent channels to serve one web transaction from different container nodes, 3. A deviation from the TCP protocol is needed to be able to change the source of the data stream at any point of time. A container which is sending a web transaction reply to a specific client must be able at any point of time to delegate the execution of such web transaction to another container located physically on another container node which will resume the sending of the data stream, and hence the whole web transaction. This capability provides an infrastructure for fault tolerance through service takeover. Since a Container will not be able to communicate except through a proprietary protocol based on UDP, and since normal web clients communicate with web servers using HTTP over TCP, an intermediate translator is necessary to narrow the gap and enable the web client to transparently send its requests to the container. Thus, the High Performance Agent component is introduced which will be referred to throughout this paper as HPA. Acting as a reverse proxy, the HPA is located physically on the machine which the web client initiates its web requests from. Unlike any normal proxy, the HPA provides proxy operations between a web client and a Container over different communication protocols, so the HPA will be communicating with the web client through normal HTTP over TCP and will translate those client requests to the container through an extended HTTP protocol over UDP. The HPA is designed to be a reverse
IJ
A
ES
Abstract: We propose a Multi-Channel Multi-Container Web application server that allows the division of a single web request into independent portions to be executed in parallel over different communication channels. To achieve this, the underlying communication infrastructure of traditional web environments is changed from the state-full TCP to the stateless UDP communication protocol. Such an architectural change provided an environment suitable for parallelization and distribution and enhanced some other already existing characteristics of the current web environments such as fault tolerance and web caching. In a previous paper we have presented the detailed architecture and results of a rapid prototype. In this paper, performance is further enhanced as part of the framework upon which web applications will run by introducing a number of changes. Such proposed architectural changes were done transparently from web clients keeping there interfaces unchanged. The target applications that would benefit from the multichannel environment are web applications that require intensive processing with respect to data transfer, and which are decomposable by nature; such application profiles demonstrate high gain in performance by the multichannel architecture over the traditional one. The importance of the Cluster Manager, which we will refer to as the CLM throughout this paper, resides in the fact that it takes the new web environment/architecture from just a clustered cascaded environment with replicated web application servers node that runs independently to serve dispatched requests, to a collaborative cluster whose nodes work in a tightly coupled mechanism to provide distributed resources for serving a single request through passing different kinds of messages between nodes through the CLM. The CLM provides a lot of services which are all designed for cluster based functionalities. Some of those functionalities are informative which provide information about different cluster nodes to other cluster nodes within the same cluster as well as providing informative services to HPAs such as the discovery service, and some others are transaction which are services that when an action is taken by one or more node in the cluster such as service state replication and migration. Thus as we will see , the CLM in the multi-channel multi-container environment will have two main tasks that can be divided into client tasks and server tasks. The CLM should be able to give the HPA some cluster based information to enable the HPA embedded dispatcher to initiate requests to the most free nodes, provide the HPA with information needed for channels reservation and initialization ,and assist the HPA in case of take over situations resulting from failures. The CLM has server side duties such as updating all cluster nodes to be in sync, provide mechanisms for running services to replicate their state over the distributed replicated shared memory cluster engine between different nodes, and finally the CLM carry out some intermediate tasks between the Deployment Manager and different nodes service factories to enable cross cluster service deployment. Keywords- Multi-Channel, Web application Server, Clustering, High Availability, Service State Migration, High Performance Computing High Performance Agent, Skeleton Caching
Ahmed Sameh
T
Kareim Sobhe
ISSN: 2230-7818
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 69
Kareim Sobhe et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 4, Issue No. 1, 069 - 080
Figure 1: The Proposed Multi-Channel Web Environment based on UDP
Figure 2: Container Node Architecture
As can be seen, many resources are allocated by a container node such as communication threads, communication buffers, memory and thread resources for instantiated services instances, therefore a garbage collector is needed for environment housekeeping for expired resources to enable them to be reinitialized and reused for following requests. Each component will have its own garbage collection module. For example the factory will be able to clean and reacquire terminated service instances after they finish execution. The communication layer will be able to clean up finished communication channels and reinitialize them for further reuse. The communication buffer manager will be able to deallocate expired unused communication buffers. Figure 2 shows the internal architecture of a container node irrespective of its supported development and runtime technology. So far, the architecture presented serves a single container functionality, so a cluster management subsystem will be added to enable message exchange between different container nodes which will help in the proposed multichannel mechanisms and through which service state migration, which is discussed later, will provide a better infrastructure for fault tolerance. To deploy services easily, a deployment manager subsystem will work closely with the cluster management subsystem to enable the clustered deployment of services which will include service images replication on container clustered nodes. In fact, the deployment manager will use the cluster management subsystem's APIs and interfaces to carry out cross cluster deployment operations Each Container type has an embedded class hierarchy all of which follow the same design and functionality as much as possible. For a service to be deployed in a specific container it should extend an Ancestor Class which is provided by all container types. The Ancestor class basically has two sets of methods; the first set is those methods which are basic virtual methods for services to extend and overload such as the main method which is called by the service factory when a service is dispatched to serve a web request. The other set of methods has the role of encapsulating functionalities that are carried out by the container on behalf of services such as reading and parsing the HTTP Request header and posted data as well as composing the HTTP Reply header. It is very important that the service developer be aware of the container class hierarchy and its interfaces to be able to utilize the container functionalities and its internal infrastructure.
IJ
A
ES
In a realistic situation, the HPA is not considered an overhead, as it is located on the client machine, very tightly coupled with the web client and serves only the normal load of a single user's web transactions. Figure 1 shows the proposed new architecture. I.1 Container The proposed Container is a web application server deployment container with all the subsystems needed to carryout the basic functionalities of a normal web application server deployment container which are loading application business logic components in the form of loadable services components, and providing them with the necessary resources to be able to operate and function. The Container has a class hierarchy that any service needs to extend to be able to be deployed in the Container. Services should be developed and implemented in the development technology that a container supports; in this case, the proposed environment will support hybrid development and runtime technology types of containers which will all be replicas in architecture and provide the same communication interface, so there will be C++ Containers and Java Containers. Maybe in the future there will be PERL containers, PHP Containers, Python Containers, ...etc., where the responsibility of each container type is to host services that are developed with its supported technology in mind, for example, the C++ container will host services developed in C++. As will be seen in the next sections, a web request could be broken down to portions that may run on different development technology container nodes, and those hybrid services can exchange messages. A container node has a multi-thread communication layer with pre-allocated communication sockets to communicate concurrently with different clients. A service factory is required to load service instances in ready-to-execute threads to assign to service requests coming from the clients. The service factory loads services that are defined in the container configuration files, thus a configuration manager subsystem is needed to parse and load configuration files which define the settings that the container should have such as the communication port range that the container should acquire, maximum number of communication threads, services names that the container should load, number of instances to be instantiated from each service type, location of multi-channel server side scripts called skeletons, ...etc. The container node
has a dispatcher which dispatches incoming web transactions to the correct services to handle the request, and also a communication buffer manager to assign and manage communication buffers allocated for dispatched services.
T
proxy because unlike normal proxies, a reverse proxy serves a specific destination or a number of destinations.
ISSN: 2230-7818
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 70
Kareim Sobhe et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 4, Issue No. 1, 069 - 080
skeleton map. When a request arrives from the HPA the container starts by validating the client.
Figure 3: Service Factory
T
On successful validation the communication layer passes the HTTP request to the service dispatcher which will then evaluate the HTTP request and with the help of the service factory a communication channel will be assigned to a service to serve the requested web transaction. After the transaction finishes, the communication layer subsystem is responsible for cleaning up the communication channel and re-initializing it to serve future requests. When the HPA initially tries to communicate with a container node, it will do so on a default administrative port through which it will be assigned a range of service ports over which it can request services from the container. The HPA will be able to communicate with any container node in the cluster over the same range of communication ports. The communication layer, with the help of the cluster management subsystem, will assign the HPA to a free range of ports and replicate this assignment to all container nodes in the cluster. After a specific idle time from a specific client the port range assignment is cleared and the HPA client will need to reclaim a port range again. The Service Manager subsystem is composed mainly of the Service Manager and the Service Dispatcher which are concerned with the service status in all stages of operations. First a service is loaded by the service factory when it is in the stage of being ready to serve requests. When a request arrives and the service dispatcher decides on the type of service that should serve a specific request, it asks the service factory to avail a service instance for this request, which is the point where the service is assigned by the dispatcher to the communication channel as well as a communication buffer and its status is changed to being operational and dispatched, where it will reside in the active service pool. When the service finishes serving the request, the garbage is collected by the service factory, returned to the status of being ready to use and transferred to the ready to execute service pool. Figure 3 gives an abstract view of the Service Manager and how the Service Dispatcher interacts with the Service Factory. This subsystem is responsible for reading configuration information from configuration sources, which are all based on XML format and require XML parsing, and storing it in internal structures ready for use by different subsystems. For example, the range of ports to be used by the communication layer, the number of instances for a specific service type, . etc. With the
IJ
A
ES
The container can serve two types of services which are designed to enable the developer of application components to develop applications in a decomposable way that will enable the concurrent execution of services, and the delivery of their results over multiple communication channels: 1- Single Channel Services: The first type of services is the Single Channel Service, which we define as the smallest executable entity that can run independently. A Single Channel Service is considered the indivisible building block of an application component which can be used to build up more complex services, providing re-usability and extend-ability. As the name indicates, the most important architectural feature of a Single Channel Service is that it communicates over a single communication channel which is basically based on UDP communication. The direct client of a Single Channel Service is the HPA which will act as an interface agent between the service and the web client. A Single Channel Service can be visualized as a Java Servlet which runs in the application server environment and delivers web results to the client. 2Skeleton Services: Since the Single Channel Service does not differ in concept from a normal web application component, a way is needed to group those independent basic components, the Single Channel Services, to build more complex functionality services able to run those components in parallel to improve performance. A Skeleton Service is basically a server side in-line script which follows the normal structure of regular web server side in-line scripts such as PHP or ASP. Some features are added to the Skeleton to achieve multichannel and parallel execution such as adding parallelization constructs to each in-line code section in the skeleton as well as the type construct defining the development environment of each in-line code section. The developer will write the skeleton source file which is a hybrid of static content as well as in-line code sections defining the dynamic parts. Then the deployment manager will take as an input the source of the Skeleton to generate the skeleton map and add independent single channel services for each concurrent in-line script section. The Skeleton map is a map that will be used by the HPA to identify each concurrent service that needs to be requested from the container in parallel. The communication layer of the Container is based on a special state-full protocol built on top of UDP sufficient to serve the web application communication needs of a single request-reply communication sequence. The communication layer consists of multi-threaded components that allow the container to handle multiple communication channels simultaneously and service multiple requests concurrently. The container does not perceive the relation between different channels, rather from the container perspective each communication channel is assigned to a service which either serves a normal service or transfers a skeleton map to the HPA, both of which require a single channel. The HPA is the one which initiates multiple communication channels to different containers to serve a complex service defined by a ISSN: 2230-7818
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 71
Kareim Sobhe et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 4, Issue No. 1, 069 - 080
T
the environment administration over all the nodes of the cluster which eases the administration of the web cluster and makes it appear as a single system to the administrator. Moreover, the Cluster Management Subsystem is responsible for all the communication necessary to carry out the service state migration.
Figure 4: Communication flow between web client and container with HPA in the middle
IJ
A
ES
help of the Cluster Management System, the Configuration Manager is capable of distributing configuration structures over different container nodes in the web cluster. The Administration Manager is an interface layer between the human administrator of the container web cluster and the container nodes. It enables the administrator to pass administration commands to the container node with the help of the Cluster Management System, the commands issued by the administrator can run transparently on multiple container nodes providing a single system image SSI for the whole container web cluster. The Deployment Manager is responsible for deploying the services provided by the application developers and replicating the deployment over different cluster container nodes with the help of the Cluster Management System. The deployment manager can deploy single channel services as back-end components as well as multichannel services represented in server side in-line scripts. The developer will provide the multichannel in-line scripts. The deployment manager will then parse the script and extract each piece of code defined as a separate thread and generate the single channel service source code for it. The deployment manager will then compile the services, generate whatever error or warning messages apply and send them to the deployment administrator. The deployment manager will choose the correct compiler for each code section according to its type, meaning that sections written in C++ will be compiled with GCC for example, and sections written in JAVA will be compiled with an appropriate JAVA compiler. On successful compilation of the services constructed from the in-line script definitions, the deployment agent will deploy those services across the container cluster nodes according to their types. C++ single channel services will be replicated over C++ containers, and JAVA services will be replicated over JAVA containers. It is important to state that some replication constructs and rules can be applied for the service replications. The default replication may be equal distribution of the services, but there might be another deployment scheme which takes into consideration the amount of memory and the speed of the CPU of each container node. After the single channel services are compiled and deployed successfully, the deployment manager will generate a skeleton map for the inline script and replicate it over cluster nodes. The skeleton map will contain pointers to the target single channel services indicating their primary and secondary locations in case of failures. The service pointer is compose of an HTTP-like header of the request for the single channel service with a little room for adding extra information about the service such as alternative service locations. The Cluster Management System is the subsystem that is responsible for the exchange of information between different containers. The cluster management system enables the deployment manager to distribute newly deployed services as well as modified ones. The Cluster Management System is also responsible for transparently executing administration commands issued by ISSN: 2230-7818
I.2 HPA The proposed High Performance Agent is the agent that the whole system depends on. The HPA acts as a multi-protocol reverse proxy between the Container and the web client. The HPA acts as a web server for the web client and as the agent which understands the constructs sent by the container to split the communication stream into multiple channels, which will enable the parallelization of delays from which should come the enhanced performance. How the gears will work can be seen in the work flow section. The communication layer of the HPA is a multi-protocol double edged communication layer. It can be viewed as two separate communication layers that communicate with each other. The first communication layer is a standard multithreaded TCP communication layer that can handle multiple web transactions concurrently. The second, UDP based, communication layer is responsible for communicating with the back end containers. A request is initiated by a web client through an HTTP over TCP connection. When the request arrives to the HPA, the HPA will use one of the already established UDP connections with the container environment and a discovery request will be initiated to identify the node that this request will be served from. A cache for discovery results in the HPA will be updated to eliminate unnecessary communication. Finally, the request will be served from the container to the HPA over UDP and the data stream will be transferred to the web client consequently over TCP, which will take place transparently to the web client. Both communication threads, TCP and UDP, will run in two different threads to avoid dependent communication blocking, hence a buffering mechanism will be needed between the two threads to enable data storage which will help to postpone mutual communication blocking between the two communication threads. Of course when the communication buffer is fully occupied, the UDP receiver will wait until the TCP sender starts spooling data from the buffer and vise versa.
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 72
Kareim Sobhe et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 4, Issue No. 1, 069 - 080
Figure 5: Multi-channel Scenario Work Flow
A
ES
Obviously a server side in-line script will contain some static content, and every time a server side script is requested by the client, the skeleton map for that script will have to be fetched from the container for the HPA to continue and establish the required single channel requests to fulfill serving the server side script request. The connection required to fetch the skeleton map is an overhead, hence adding a cache module to the HPA to keep unmodified versions of skeleton maps will achieve two things: 1) eliminate an extra connection that is needed for the skeleton fetching, 2) cache some of the static content that is embedded in the dynamic content generated by back end services. All the scripted sections will be cached by the HPA, and the impact of that will depend on the size of the cachable areas. Of course in current modern scripting environment such caching is not possible as the client has no clue which parts of the UI, e.g. HTML, is static and which part is generated by a backend business logic engine, yet the client, HPA, in our case has no access to the business logic source code. The Discovery client is the module that is responsible for advising the HPA of the locations of services through communication with the Container discovery service. Caching will be applied to eliminate unneeded communication as much as possible.
HPA then opens a UDP connection to the container node specified in the URI, and passes the request to it. The container then dispatches the request to the correct service instance to serve the request. The stream returned by the service to the HPA over UDP is sent to the client over UDP. As can be seen from the figure, the UDP communication is carried out in parallel with the TCP communication which allows the pipelining communication mechanism that eliminates overhead and increases the speed. II.2 Multi-channel Scenario The proposed multichannel scenario is based on the single channel scenario, as a web transaction is broken down into a number of single channel services that are distributed and executed concurrently and serve their content over parallel communication channels. Figure 5 illustrates the work flow of the multichannel scenario. The request reaches the HPA over TCP as usual, exactly as in the previous scenario. The HPA evaluates the request and identifies it as a multichannel request by the service name extension .skel. The HPA then makes necessary updates to its skeleton cache. Then it fetches the skeleton data structure from its cache, and identifies the different single channel requests needed. The HPA then spawns a thread for each single channel request to different container nodes according to the information in the skeleton map of the multichannel service. The HPA returns the replies of the channels to the web client over TCP as they arrive according to their chronological order which entails some buffering and blocking techniques. For example, if the second channel finishes before the first channel, the second channel content must be buffered on the HPA side until the first channel finishes, during which the communication channel will be blocked.
T
Figure 4 gives an overall view of the communication mechanism between the web client and the container with the intermediate agent HPA in the middle.
IJ
II- WORK FLOW Work flow of requests and mechanisms is discussed to highlight how the gears will really move, and how the bits and pieces of the whole environment will cooperate and collaborate to serve web requests using the multichannel mechanism. A file spooler is used as an example to clarify the three scenarios presented; the Single Channel scenario, the Multi-Channel scenario, and the Service State Migration Scenario. Work flow figures provide visualization of each scenario. II.1 Single Channel Scenario The proposed Single Channel Scenario is the basic building block upon which the multichannel scenario is built. A special case one container of Figure 5 illustrates the work flow of the single channel scenario. The scenario starts with a web client using the HPA installed on the same machine and operating on the loopback address, to initiate a single channel request to a container node. The request is in normal URI structure which contains the name of the container node that the requested service resides on, and the name of the service to be executed. The request is sent to the HPA over TCP. The HPA evaluates the request and identifies it as a single channel request. The ISSN: 2230-7818
III- CLUSTER MANAGER (CLM) The proposed CLM provides a lot of services which are all designed for cluster based functionalities. Some of those functionalities are informative which provide information about different cluster nodes to other cluster nodes within the same cluster as well as providing informative services to HPAs such as the discovery service, and some others are transaction which are services that when an action is taken by one or more node in the cluster such as service state replication and migration. Thus the CLM is considered the backbone infrastructure for all the communication, represented in message passing, between different cluster nodes and in some cases between HPAs and cluster nodes; consequently the set of services that needs to be carried out by the CLM imposes its architecture which reflects on the whole web environment architecture. Though-out this paper we will demonstrate the architecture of the new proposed web environment with respect to the CLM and the services the CLM provides as well as the detailed mechanism that such services adopt to operate. All current midrange web environments that support clusters, such as Websphere, Weblogic, Apache/Tomcate ...etc. are based on the dispatcher model. The dispatcher in this case will carry out two main tasks which are the scheduling task, which decides which node in the cluster should server the next coming request, and the physical dispatching mechanism which is based in most cases on either a layer-4 or layer-7 switching [2]. A lot of scheduling techniques are available, but the problem is that for such scheduling techniques to operate efficiently, they need to
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 73
Kareim Sobhe et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 4, Issue No. 1, 069 - 080
T
all nodes are synchronized together without publishing the details of the internal structure of the environment to its clients, and hence some of the nodes of the cluster will have some extra functionalities which are cluster related to manage this kind of node synchronization process and act as leaders for the environment to say which node does what, and when. This cluster based functionality can be handled by one node of the cluster, but since we intended to use the word cluster, which is meant to be used, then the high availability feature is a must and needs to be considered as a priority, so a number of nodes are nominated to carry out such cluster based functionalities, and be replica to each other, so when one fails the other will still be able to function and fulfill the desired role. By this definition, nodes of that type of cluster-based functionalities will not be only having the role of synchronization between different cluster component nodes, but more importantly will act as a database that represents all the cluster attributes and all the nodes states and capabilities at any point of time which will need to be updated frequently and efficiently to avoid overheads and discrepancy between different manager nodes databases, and this gives such group of nodes the capabilities of taking on behalf of the whole cluster in come cases which leads to the desired SSI (Single System Image) functionality which makes the cluster appear as a single entity that is capable to encapsulating, controlling, and hiding all its internal conflicts and operations from its clients. Obviously, member of nodes of our new environment will be of different kinds, namely Compute Nodes and Management Nodes. Before we can go to the details of the CLM architecture and how it work; following is the details of the role and the functionalities of each type. III.1 Compute Node A compute node is a normal container node that besides being able to serve channel requests it has a minimal function CLM version that is able to work as a managed node which is controlled by the Management nodes of the cluster and being able to synchronize with the other nodes through communicating with the management nodes of the cluster. The CLM of a compute node acts as a client agent to the CLM of the Management nodes of the cluster. The CLM of a compute node would have the following subsystems: III.1.1 Reservation Service The reservation service is a network communication service based on UDP which listens on a specific port that is provided by the Configuration Manager [3]. The service waits for requests from the HPA asking to reserve a channel to be able to initiate requests to this node. The CLM on receiving a reservation request will attempt to locate available channels that are either not already reserved by HPAs, or those being ideal of a predefined duration and are marked as expired and ready for reuse; on the success to locate candidate channels they are reserved for this HPA and will reject any service requests coming from different source than that HPA till they expire after being idle. The port numbers of the channels assigned will be sent back to the HPA as a reply to its request in case of successful reservation of a failure notification other wise. Due to the fact that we are using the UDP stateless communication protocol, if an HPA initiated a request for reservation to a compute node that has channels assigned to this HPA, the CLM will reply to the HPA with the ports of the already reserved channels and will timestamp the channels with the current time of registration to avoid quick expiration.
IJ
A
ES
have some sort of feedback about the status of the nodes, and more precisely the status of the service that serve web requests on each node of the cluster as the state of the physical node machine might not be a good indication. A lot of techniques were invented for this, and one of the most famous is the IBM Websphere Edge Server Advisor component, which feeds back to the layer-4 dispatcher some kind of advisory information about the load of the cluster node it is running on, also a layer-7 caching reverse proxy is provided within the Websphere suite that enables layer-7 dispatching based on the advisory component. But as can be seen, this dispatching mechanisms work on a transaction as the smallest portion of distribution, so a transaction will need at least 2 physical TCP connections and will reside at the end on one node to serve it. The same goes for the Grid, yet plug-ins and wrappers were invented for HPC tools, such as MPI, openMP, and PVM, to be integrated to grid environments to solve this problem. A grid will have a dispatcher and a scheduler to assign tasks to the best fitting candidate grid node to execute such task as a first level dispatching. The second level dispatching in this case is handled by the integrated HPC tool to break down the task to smaller tasks and execute them distributedly, and a lot of nice features will be inherited in this model enabling collaborative execution of a single task, and exchanging data over HPC infrastructure. Some attempts are made to integrate web services technology with such environments to serve on the web [5]. In the proposed Multi-channel Multi-Container environment, we need to utilize the HPC methods and features but in a way to integrate tightly to the environment instead of building interfaces to already existing libraries. So the model provide a break down of a web server page (web service) that the programmer of the service will impose and define, which is the same in MPI and PVM, as the programmer provide the segmentation of the program and the conditions that each portion should run according. Also the CLM in the proposed multi-channel environment is designed for web which will eliminate a lot of the layered integration overhead, unlike HPC application, web application are not long time serving, yet they are database intensive applications that might need some processing power but not that much that an NQUEEN MPI solver will need. Thus as we will see shortly, the CLM in the multi-channel environment will have two main tasks that can be divided into client tasks and server tasks. The CLM should be able to give the HPA some cluster based information to enable the HPA embedded dispatcher to initiate requests to the most free nodes, provide the HPA with information needed for channels reservation and initialization and assist the HPA in case of take over situations resulting from failures. The CLM has server side duties such as updating all cluster nodes to be in sync, provide a mechanisms for running services to replicate their state over the distributed replicated shared memory cluster engine between different nodes, and finally the CLM carry out some intermediate tasks between the Deployment Manager and different nodes service factories to enable cross cluster service deployment. The CLM is basically an engine that resides right above the communication layer of the container node, and it is designed to control the service factory, reserved channels queue, shared memory segment. Since the new proposed web environment is designed to be a clustered one, then it should appear to an outsider as single entity and that all the internal communication of the cluster should be done transparently so ISSN: 2230-7818
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 74
Kareim Sobhe et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 4, Issue No. 1, 069 - 080
T
seconds. The nodes load data will be stored on all nodes of the cluster, but practically in the current version of our architecture this data will not be used except by the management nodes, as we will describe shortly in the next section, but we intended to replicate it on other compute nodes as it might be used in the future as a base of some decision, such as nodes which are very busy or marked as being failed should not receive shared memory data, so when a service is trying to replicate its state, such nodes will not be included in the list of nodes that the state will be sent to.
IJ
A
ES
Also, due to the fact that the reservation request by the HPA, and the reservation reply by the container node are small enough they can be encapsulated within single UDP packet which makes the reservation communication very simple and will be based on timeout/resend mechanisms avoiding to develop any flow control mechanism in such communication. More precisely, the Reservation service is considered the channel manager, as it is the service in control of communication channel in terms of assigning them to HPAs, and resetting their status to being available when idle. III.1.2 Shared Memory Segment The Shared Memory Segment is basically a memory segment that keeps shared data of services running on different nodes. For service state migration purposes, each service will have the capability of storing its state into an XML stream, and forward it to the CLM, of the node its is running on, asking it to replicated to the cluster nodes, consequently, and transparently from the service, the CLM is responsible of replicating such service state to all nodes of the cluster to be stored in their shared memory segments. As each service on the cluster will have a unique ID that identifies it, precisely composed of the cluster node ID and the location of the service in the service queue, the state of the service is stored by it service ID on all nodes of the cluster, and hence if a node fails and service state migration took place and a node is chosen to continue the execution of the failing service, it will use the state of the service stored in its Shared Memory Segment to resume the execution and serve the client HPA from the nearest point to the failure without the need to reexecuting the whole service from the beginning. We will describe the Service State Migration mechanism shortly and will illustrate how the Shared Memory Segment is crucial for the Service State Migration to function. Each time the state of a service is generated for replication, the sender CLM timestamp the state, so when it’s stored on remote nodes the timestamp will define the freshness of the stored state. This is a very important as when UDP packets sent to remote CLM fail to arrive to their destination, this timestamp will make the management nodes differentiate between up to date and lagged behind nodes in case of searching for a take over node when a service state migration is needed and initiated. It is very important to highlight that the Shared Memory Segment replication is working on top of a UDP communication scheme, and for this replication to operate as fast and reliable as possible, we impose one important constraint which is the need that the state of any service should fit into one UDP packet which will enable such communication to be built on top of a timeout/resend mechanisms avoiding the need of a flow control mechanism which will slow down the transfer and may cause a situation of what is called Traffic Storm leading to a congestion in the internal network of the cluster. III.1.3 Load Replicator As all the nodes of the cluster need to be synchronize aiming the Cluster to appear as a Single System Image, which most clustered environments in different domains aim at, nodes of the cluster will need to send its state, represented mainly in the processor idle time, free memory, network bandwidth used, and number of channels serving, to other nodes of the cluster. This kind of data is very small and can be encapsulated in one UDP packet and sent frequently every fixed duration of time, which is characterized by the nature of the sent data to be a very small duration of time that is on the average a couple of ISSN: 2230-7818
Figure 6: Cluster Management Architecture
III.2 Management Node The Management Node is a normal compute node with some extra features and capabilities that enable it to carry out some cluster based management tasks. There are two main management tasks that the CLM of a Management Node which are Container Management and HPA management. Two services are started on any Management Container Node that listens on two UDP ports defined in the container configuration and provided to the CLM by the Configuration Manager [3]. III.2.1 Management Service The Management Service is the service which carries out containers related cluster management tasks, which is mainly service state migration initiation which is a very complex operation that is based on communicating with different nodes resulting in choosing a candidate node to takeover the execution of an already running service that happens to fail to complete its execution. III.2.2 HPA Management Service The HPA management service is responsible for responding to HPA requests to provide it with cluster related info which can be summarized in the following points: 1. Discovery request at startup of the HPA for cluster nodes and their types and service languages. 2. Frequent discovery request initiated by HPAs requesting information about cluster nodes load to be able to choose the less load nodes to forward service requests to. 3. Take over requests initiated by HPAs timing out on a failing service. IV. CLUSTER MANAGER ARCHITECTURE IV.1 Overview Figure 6 gives an overview of the CLM of different node types, compute and management nodes, and how they communication together through the CLM communication layer. As it is obvious from the diagram, and the description of
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 75
Kareim Sobhe et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 4, Issue No. 1, 069 - 080
be high as most of the cluster transaction are done frequently and continuously.
Figure 7: HPA Startup
T
IV.3.2 Traffic Storm Due to the fact that the emulated multicasting functionality will be used by the container nodes CLM, more bandwidth will be needed by the CLM communication layer to be able to operate. Not only that, due to the use of a stateless protocol and the timeout/resend mechanisms that are adopted, a symptom that is called Traffic Storm may occur at peaks especially when the number of services and/or the number of nodes in a cluster increases, practically speaking this is un avoidable, yet we can try to make some decisions that will help reaching such situation frequently. The suggested work around is that for some cluster services, such as service state replication, the replication will occur on a subset of nodes and not all the nodes in the cluster. Practically, shared memory replication is the most service of all requiring considerable amount of data transfer compared with load replicator and discovery services, and in fact a failing service will need only one service to takeover from it, so it will not be needed to replicate the state on all nodes of the cluster. Thus, from the shared memory replication perspective we might have subclusters, defined by the administrator, to replicate with each other.
IJ
A
ES
the cluster service in the previous section, the cluster services on different nodes communicate together on a predefined port number and the communication unit is a UDP packet which in most cases will need to land on all nodes in the cluster, hence a kind of a multicast communication scheme will be needed. For example, incase of the shared memory segment, the state of one service on one node will need to be sent to all or some nodes of the cluster to be replicated in their shared memory segments for future use in case of service state migration. The cluster services are also running in parallel threads within the CLM which makes them independent, yet they communicate with each other through internal APIs, in the form of object method invocations coupled with some locking mechanisms for critical sections handling. IV.2 Communication Layer The communication layer used in CLM is an extended version of the normal UDP-Based communication layer used for servicing HTTP channel requests between a normal container and an HPA requesting a service execution. The communication layer was extended with an emulated multicast capability that allows one node at any point of time to send a packet to all or some nodes of the cluster. The emulated multicast functionality will act transparently from the caller point of view as a normal multicast function, but it will physically connect to each node in the recipients queue and send to it a packet containing the data need to be sent. As we described above, one important constraint is valid for all cluster services, which is the need to encapsulating all the data of any cluster transaction in one packet. IV.3 Obstacles and Work Around Two main problems arise from the above architecture which needs some sort of work around or at least justification. The first thing will be a justification for why did we use emulated multicasting instead of native multicasting which is available in UDP, and the other thing is a work around for what is called Traffic storm which will arise from the usage of emulated multicasting as packets will be physically sent individually to each node in the recipient queue, instead of being sent once on a multicast address which seems to save a lot of traffic on the cluster internal network. IV.3.1 Native Multicast vs. Emulated Multicast Although the UDP socket library provides packet multicasting which saves a lot of traffic in case a single packet will need to be sent to a group of nodes on the network, which is what exactly needed by the CLM, yet two constraint arises which pushed for implementing an emulated multicast functionality in the container communication layer to emulate the native multicast, which are: 1. The multicast is not standard in all environments and will need specific configuration on the network devices of the containers cluster network, which is not standard, and one or the most important features that we are after is being able to have a hybrid environment in terms of Operating Systems and Container Software. 2. In most multicast libraries, de-fragmentation is not available on multicast packets, which leaves us with a very limited space per packet to transfer data, and since we have the constraint of encapsulating data of any cluster transaction into one UDP packet, the native multicast will represent a major obstacle to the CLM capabilities. Some mechanisms can be used such as compression to send more data with the same limited packet size, yet the performance draw back on this will ISSN: 2230-7818
V. CLUSTER MANAGER SCENARIOS V.1 Discovery Scenario The discovery scenario is initiated at the HPA startup. Figure 7 and the following steps describe this scenario. 1. The HPA reads its configuration file on its start up to identify the primary management that it will connect to 2. The HPA will connect to the HPA Management port of the cluster it needs service from 3. The HPA will send the management node a request for info about the cluster 4. The management node will send back the HPA an XML stream whose records represent all the nodes of the cluster, their types, either management or compute node, and the language of the services it runs on top. Also, the preserver port on each node will be sent to the HPA for future reference. It will be up to the primary management node to send all or some of the nodes info to the HPA, meaning that this might by a way to load balancing that is very close to the DNS rotation mechanism that will distribute load in peak times. 5. The HPA will parse and store the cluster info in its internal buffer for channel reservation and for load info amendments. 6. If the HPA receives no response from the management node it will keep on resending the discovery request over a
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 76
Kareim Sobhe et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 4, Issue No. 1, 069 - 080
Figure 8: Shared memory Segment Replication Service State Migration
IJ
A
ES
V.2 Channel Reservation Scenario Following on the same previous figure, the HPA will iterate on each node in the discovery info sent to it, and initiate a reservation request to it. The reservation process follows the following steps: 1. The HPA sends a reservation request to a specific container node 2. The container node will search for channels already reserved by the requesting HPA, and if it finds any it will stamp them by the current time and send back the port number of the channels to the HPA 3. If not, the reserver will look for available channels to assign to the HPA, and if any is found they are reserved to the IP of the HPA, so no requests will be accepted by those channels except from the IP address of the requesting HPA. The channels will then be stamped with the current time of the reservation to start calculating the idle time. Each time the HPA communicate over a specific channel with a container, the container will update the time stamp of the channel with the current time. 4. If not, the reserver will start looking for channels whose time stamp is too old which means that they are no longer reserved, so they will be cleared and will be reserved for the requesting HPA. 5. If not, the HPA will receive a failure message which indicates that all the channels within the current container are occupied. V.3 Shared Memory Segment Replication Scenario The Shared Memory Segment Replication can be illustrated as in figure 8 and the following steps below. As the figure appears to be complex, just try to follow the red bold arrows: 1. A running service should call its serialze routine periodically, which the a method that the service developer overloads 2. The overloaded serialize routine should contain the code necessary to produce a string stream that represent the state of the service 3. At the exit of the routine the CLM is invoked with the state stream. 4. The CLM will send to all its neighbors defined in the cluster queue (This might not be all nodes in the cluster) the state to be stored in the Shared Memory Segment. The state stream is coupled with the unique ID of the service and a time stamp representing the freshness of the state.
5. One important attribute that is also coupled with the state is the serial number of the last packet sent which helps a take over service to resume data transfer without disturbing the flow control of the already established communication. V.4 Service State Migration Scenario Following on the same previous diagram, the service state migration is initiated by the HPA reporting a delay in service delivery. Follow the green bold arrows in the previous diagram as well as the steps defined below: 1. When an HPA fails to receive the reply from a funning service that it initiated, it sends a help request to one of the management node CLM. 2. The management node CLM will acknowledge the recipient of the request, other wise the HPA will keep on sending the request over a timeout period for a predefined number of retries. 3. The management node CLM will send to all the nodes asking for the state version stored in their shared memory segment for the requested service. 4. On reply of the nodes, the management node CLM will pick the node that has the most fresh copy of the state as well as the enough resources needed to handle the failing request (see next section), and will notify it of being the takeover candidate through its CLM. 5. The candidate node CLM will contact its service factory and the Channel Reserver to avail a channel and a service of the same type (as defined by the configuration manager in service.xml [3]). 6. The candidate node CLM will then call the de-serialize routine of the chosen service with the state in its Shared Memory Segment. The de-serialize routine is an overloaded method by the developer of the service. 7. The de-serialize routine should have the code necessary to read the state and set the internal attribute of the service object to the state represented in the state stream. 8. The CLM will then call the main routine of the service to fire its execution. 9. Since we use UDP, the HPA can still receive the stream recovery the same UDP socket handler. 10. The HPA will discard any packets that it already received from the original service, as failure may happen at any point between two consecutive state replications. V.5 Cluster Nodes Load Replicator The load replicator is a mean for a container node to tell others about its free load. This is used in two main cases which are finding a candidate for a service state migration, and choosing container nodes by an HPA to establish channels required to serve a server page. Thus we have two scenarios, the first one is a Container-to-Container Load replication and the other is Container-to-HPA load replication. V.6 Container to Container Scenario The Container to Container load replication scenario works in a push mode where each node will periodically send an XML stream to all other nodes with its state. The management node will primarily store the load data of all nodes to be able to make service state migration decisions as well as responding to load inquiries by the HPA. If the management node does not hear from a specific node for a predefined duration of time, this node will be marked disabled by the management node and will not be considered as a candidate in service state migration until it starts sending its state again. V.7 Container to HPA
T
predefined timeout for a predefined number of retries before it report failure and exit.
ISSN: 2230-7818
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 77
Kareim Sobhe et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 4, Issue No. 1, 069 - 080
The Container to HPA load replication scenario works in pull mode, were the HPA will periodically initiate a load info request from the management node and the management node will respond with and XML stream; each record in the XML stream will refer to a node through its name and an integer which represents its priority representing the availability of the node. A node with a priority 0 means that this nod is no longer exists, and the higher the value of the priority the more strong the node is. On every load pull, the HPA will update its discovery buffer with the loads so it can make decisions on which nodes should be used which requesting services channels.
T
As can be seen from the above snapshots, the HPA start receiving the reply from the C++ service, and meanwhile it can be seen clearly on the C++ and Java consoles that the service on the C++ container is replicating its state and the JAVA container is receiving such state and storing it in its Shared Memory Segment. When the failure happens, the HPA starts reporting a timeout situation, and after a number of retries it reports failure and start asking for takeover from the management node. As can be seen on the C++ console, as the C++ node is the management node it starts the take-over process to find a takeover node candidate. On the JAVA console, the java container receives the takeover request and send a takeover reply with the freshness of the service state that it store in its shared memory segment, the management node accept and fires the approval to the JAVA node, and we can see clearly on the JAVA console that the JAVA container start taking over and reserving a temporary channel for this service and start resuming the service from the last state it received. VI.1 Case Study: Web Usage Statistics The Web usage statistical analysis applications are important applications in the web domain. For large web portals, the usage data is very large and arrives in large volumes and frequency, which makes the statistical calculations processing intensive. Two approaches can be used to generate web usage statistical reports:
IJ
A
ES
VI- EXPERIMENTS The cluster related illustrative experiments showing startups of HPA and Containers are introduced in [2, 3]. This section will illustrate an example for Shared Memory Replication and Service State Migration scenarios, which is carried out across different container types, C++ and JAVA. The example used in this section is based on a service that reads a file and send it to the browser, the file name is passed to the service in the URL query string as a normal GET request parameter, and we call this service the File Service. This service has two version implementations, one in C++ and the other in JAVA, and each version is deployed on a separate container. The service implementation utilizes the serialize and the de-serialize routines which are overloaded by the two implementations of the service; thus a running instance of such service will periodically multicast its state to be replicated over different nodes of the cluster. The initial HTTP request is directed to the C++ service hosted on the C++ container, which is at the same time the management node of the cluster. The C++ implementation is designed to fail after sending a portion of the file requested. The expected behavior is that the running service will start sending the files over UDP packets and will issue State Replication for the CLM periodically every number of UDP packets sent. The replicated state contains the file name being served, the last location of the file being sent, and the sequence number of the last UDP packet sent to the HPA. On failure of the service were the HPA will fail to receive the service reply successfully, the HPA will report failure to the management node CLM, and the container management node should transparently initiate the service state migration process and the HPA should be able to resume the receiving of the rest of the reply transparently on the same socket handler. The following console snapshots shows how this process takes place during running the experiment that tests the shared memory replication and the service state migration functionalities.
VI.1.1 Online Batch Reports Processing ISSN: 2230-7818
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 78
Kareim Sobhe et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 4, Issue No. 1, 069 - 080
number of concurrent requests threshold is increased as the number of nodes in the back-end cluster is increased. Figure 13 compares the four experiments showing that the multichannel environment makes use of each added node. At the end of each experiment a new node was add, and a difference in the performance could be observed clearly in the final run in the chart.
T
The following are the most important observations from the above results: 1. Multichannel environment provided a high performance gain that was around 110 % in average 2. The gain in performance was directly proportional to the increase in the number of nodes in the cluster. 3. As the number of concurrent requests increased the percentage gain in performance decreased. 4. The decrease in the average percentage of performance gain is not linear relative to the increase in the concurrent number of connections. For example, in the third experiment where there were 3 server nodes, comparing the first and the last runs of this experiment, reveals that the average gain in performance (gain in performance per request) decreased by 20 % yet the number of concurrent requests increased 15 times, and moreover the number of concurrent database queries increased 15 times as well. 5. The rate of decrease in performance gain relative to the increase in the number of concurrent requests is almost linear. 6. It was noticed during the experiments that the container CPU processing usage during the multichannel experiments was higher than the processing usage during the traditional environment experiments, which indicated that the multichannel environment utilized more resources to provide better performance. VII- CONCLUSION In conclusion, we end up with a web environment which is totally designed for web transactions serving and able to perform real distributed execution within a transaction request providing high performance computing features not through integration with HPC tools, but within the framework of the web environment and totally inherited from the common web concepts like server side scripting standards and extending the HTTP protocol to serve such means. The fruit if this is a lot of features such as transparent fault tolerance without full reexecution of the failing service, the ability of different heterogeneous services to exchange service state message and emulate a unified shared memory between all cluster nodes whose content can be used by services written in different technologies. Moreover, the CLM contributed to the presentation of the Multi-Channel cluster as an SSI Single System Image. It can be seen that with this kind of CLM functionalities, the Multi-Channel environments can be situated in between web application servers and grid environments; it provides almost all the functionalities that a normal web application environment provides and a subset of functionalities that are provided by a grid environment
A
ES
For this option, the data of a specific period is processed and rolled up into a small database that holds the results of the usage reports and eliminates a lot of details that in some cases might be needed. This approach is more wide-spread as the need to see the usage reports on-line is a feature that is business dependent and is not very much in demand by many portal owners. The kind of reports to be generated is predefined and cannot be altered at run time since the details from which the reports are generated are not included in the report database. VI.1.2 Online Usage Report Generation For this option, the raw usage data is stored in a database, and mined to generate up-to-date online reports over any duration, not limited to the predefined roll-up periods defined in the previous approach. This approach provides a lot of flexibility for generating reports or even customizing a report with no limits, however, such reports take more time than the ones generated by the batch approach. Of interest here is the second approach "Online Usage Report Generation". An attempt is made to enhance its execution and get better performance through deploying it on the multichannel environment. One of the most processing intensive parts of the usage report is page clustering. Generating usage statistics per portal page, is easy and quick, however portal owners are usually not interested in viewing the statistics of individual pages as a portal will usually contain many pages and there is no real benefit from such a report. Rather a portal owner or manager is more interested in calculating usage for parts of the web portal, meaning that groups of pages are clustered and defined as belonging together under specific sections of a web portal, in which case usage is presented per cluster of pages, a very processing intensive task. This case study shows how dividing the task into smaller sub-tasks affects the performance with respect to the traditional way of executing such task.
IJ
VI.1.3 Experiment Setup The following experiments were run against real live data for a very high traffic on-line portal. The usage database in our experiments has 2 months of data with over 750000 hits, 200000 visits, and 150000 visitors. The portal comprises 650 unique pages which are all dynamic. Those pages are categorized into 25 clusters with an average of 26 pages per cluster, and a maximum of 42 pages per cluster. The experiments were run on both environments; the traditional and the multi-channel. Two variables were changed throughout the experiment runs: the number of concurrent requests, and the number of nodes in the serving cluster. VI.1.4 Results First the experiments were run on the Multichannel and the traditional web environment against one server node. The results show a boost in performance in the runs on the multichannel environment, as shown by figure 9. The chart presents the average percentage gain in performance over all channels per each run, and in every run the number of concurrent requests is increased by 2. A new cluster node was introduced in the second experiment to see the effect of clustering and how the multichannel environment benefits from adding new nodes. Figures 9-13 display the average percentage of performance gain when augmenting the number of nodes in the back-end cluster. The same pattern is repeated as we add more new nodes with the only difference that the ISSN: 2230-7818
REFERENCES
[1] A. Sameh, K. Sobh, "Multi-Channel Clustered Web Application Servers Architecture", Working paper [2] A. Sameh, K. Sobh, "Multi-Channel Clustered Web Application Servers Configuration Manager", Working paper [3] A. Sameh, K. Sobh, "Multi-Channel Clustered Web Application Servers Deployment Manager", Working paper
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 79
Kareim Sobhe et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 4, Issue No. 1, 069 - 080 [4] A. Sameh, K. Sobh, "Multi-Channel Clustered Web Application Servers Cluster Manager", Working paper [5] J. Crowcroft, Iain Phillips. TCP/IP and Linux Protocol Implementation: Systems Code for the Linux Internet. John Wiley Sons, Dec 2001. ISBN-10: 0471408824. ISBN-13: 978-0471408826.
Figure 11: Server Nodes- Web Usage Statistics Average percentage Gain in Performance: 3 Containers
T
Figure 9: Server Node- Web Usage Statistics Average percentage Gain in Performance: 1 Container
ES
Figure 12: Server Nodes- Web Usage Statistics Average percentage Gain in Performance: 4 Containers
Figure 10: Server Nodes- Web Usage Statistics Average percentage Gain in Performance: 2 Containers
IJ
A
Figure 13: Performance Comparison with respect to number of Nodes
ISSN: 2230-7818
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 80