Elysium technologies parallel and distributed systems 2016 17 titles with abstracts by Elysium Technologies Private Ltd

ETPL PDS -001

Space Shuffle: A Scalable, Flexible, and High-Performance Data Center Network

The increasing need of cloud and big data applications requires data center networks to be scalable and bandwidth-rich. Current data center network architectures often use rigid topologies to increase network bandwidth. A major limitation is that they can hardly support incremental network growth. Recent work has been investigating new network architecture and protocols to achieve three important design goals of large data centers, namely, high throughput, routing and forwarding scalability, and flexibility for incremental growth. Unfortunately, existing data center network architectures focus on one or two of the above properties and pay little attention to the others. In this paper, we design a novel flexible data center network architecture, Space Shuffle (S2), which applies greedy routing on multiple ring spaces to achieve high-throughput, scalability, and flexibility. The proposed greedy routing protocol of S2 effectively exploits the path diversity of densely connected topologies and enables keybased routing. Extensive experimental studies show that S2 provides high bisectional bandwidth and throughput, near-optimal routing path lengths, extremely small forwarding state, fairness among concurrent data flows, and resiliency to network failures.

ETPL PDS - 002

Towards Practical and Near-optimal Coflow Scheduling for Data Center Networks

In current data centers, an application (e.g., MapReduce, Dryad, search platform, etc.) usually generates a group of parallel flows to complete a job. These flows compose a coflow and only completing them all is meaningful to the application. Accordingly, minimizing the average Coflow Completion Time (CCT) becomes a critical objective of flow scheduling. However, achieving this goal in todayâ&#x20AC;&#x2122;s Data Center Networks (DCNs) is quite challenging, not only because the schedule problem is theoretically NP-hard, but also because it is tough to perform practical flow scheduling in large-scale DCNs. In this paper, we find that minimizing the average CCT of a set of coflows is equivalent to the well-known problem of minimizing the sum of completion times in a concurrent open shop. As there are abundant existing solutions for concurrent open shop, we open up a variety of techniques for coflow scheduling. Inspired by the best known result, we derive a 2-approximation algorithm for coflow scheduling, and further develop a decentralized coflow scheduling system, D-CAS, which avoids the system problems associated with current centralized proposals while addressing the performance challenges of decentralized suggestions. Trace-driven simulations indicate that D-CAS achieves a performance close to Varys, the state-of-the-art centralized method, and outperforms Baraat, and the only existing decentralized method, significantly.

ETPL PDS -003

PathGraph: A Path Centric Graph Processing System

Large scale iterative graph computation presents an interesting systems challenge due to two well known problems: (1) the lack of access locality and (2) the lack of storage efficiency. This paper presents PathGraph, a system for improving iterative graph computation on graphs with billions of edges. First, we improve the memory and disk access locality for iterative computation algorithms on large graphs by modeling a large graph using a collection of tree-based partitions. This enables us to use path-centric computation rather than vertexcentric or edge-centric computation. For each tree partition, we re-label vertices using DFS in order to preserve consistency between the order of vertex ids and vertex order in the paths. Second, a compact storage that is optimized for iterative graph parallel computation is developed in the PathGraph system. Concretely, we employ delta-compression and store tree-based partitions in a DFS order. By clustering highly correlated paths together as tree based partitions, we maximize sequential access and minimize random access on storage media. Third but not the least, our path-centric computation model is implemented using a scatter/gather programming model. We parallel the iterative computation at partition tree level and perform sequential local updates for vertices in each tree partition to improve the convergence speed. To provide well balanced workloads among parallel threads at tree partition level, we introduce the concept of multiple stealing points based task queue to allow work stealings from multiple points in the task queue. We evaluate the effectiveness of PathGraph by comparing with recent representative graph processing systems such as GraphChi and X-Stream etc. Our experimental results show that our approach outperforms the two systems on a number of graph algorithms for both in-memory and out-of-core graphs. While our approach achieves better data balance and load balance, it also shows better speedup than the two ystems with the growth of threads.

ETPL PDS - 004

Leaky Buffer: A Novel Abstraction for Relieving Memory Pressure from Cluster Data Processing Frameworks

The shift to the in-memory data processing paradigm has had a major influence on the development of cluster data processing frameworks. Numerous frameworks from the industry, open source community and academia are adopting the in-memory paradigm to achieve functionalities and performance breakthroughs. However, despite the advantages of these inmemory frameworks, in practice they are susceptible to memorypressure related performance collapse and failures. The contributions of this paper are two-fold. Firstly, we conduct a detailed diagnosis of the memory pressure problem and identify three preconditions for the performance collapse. These preconditions not only explain the problem but also shed light on the possible solution strategies. Secondly, we propose a novel programming abstraction called the leaky buffer that eliminates one of the preconditions, thereby addressing the underlying problem. We have implemented a leaky buffer enabled hashtable in Spark, and we believe it is also able to substitute the hashtable that performs similar hash aggregation operations in any other programs or data processing frameworks. Experiments on a range of memory intensive aggregation operations show that the leaky buffer abstraction can drastically reduce the occurrence of memory-related failures, improve performance by up to 507% and reduce memory usage by up to 87.5%.

ETPL PDS -005

DistR: A Distributed Method for the Reachability Query over Large Uncertain Graphs

Among uncertain graph queries, reachability, i.e., the probability that one vertex is reachable from another, is likely the most fundamental one. Although this problem has been studied within the field of network reliability, solutions are implemented on a single computer and can only handle small graphs. However, as the size of graph applications continually increases, the corresponding graph data can no longer fit within a single computerâ&#x20AC;&#x2122;s memory and must therefore be distributed across several machines. Furthermore, the computation of probabilistic reachability queries is #P-complete making it very expensive even on small graphs. In this paper, we develop an efficient distributed strategy, called DistR, to solve the problem of reachability query over large uncertain graphs. Specifically, we perform the task in two steps: distributed graph reduction and distributed consolidation. In the distributed graph reduction step, we find all of the maximal subgraphs of the original graph, whose reachability probabilities can be calculated in polynomial time, compute them and reduce the graph accordingly. After this step, only a small graph remains. In the distributed consolidation step, we transform the problem into a relational join process and provide an approximate answer to the #P-complete reachability query. Extensive experimental studies show that our distributed approach is efficient in terms of both computational and communication costs, and has high accuracy.

ETPL PDS - 006

RAMPS: A Reconfigurable Architecture for Minimal Perfect Sequencing

The alignment of many short sequences of DNA, called reads, to a long reference genome is a common task in molecular biology. When the problem is expanded to handle typical workloads of billions of reads, execution time becomes critical. In this paper we present a novel reconfigurable architecture for minimal perfect sequencing (RAMPS). While existing solutions attempt to align a high percentage of the reads using a small memory footprint, RAMPS focuses on performing fast exact matching. Using the human genome as a reference, RAMPS aligns short reads hundreds of thousands of times faster than current software implementations such as SOAP2 or Bowtie, and about a thousand times faster than GPU implementations such as SOAP3. Whereas other aligners require hours to preprocess reference genomes, RAMPS can preprocess the reference human genome in a few minutes, opening the possibility of using new reference sources that are more genetically similar to the newly sequenced data.

ETPL PDS -007

Cost Minimization Algorithms for Data Center Management

Due to the increasing usage of cloud computing applications, it is important to minimize energy cost consumed by a data center, and simultaneously, to improve quality of service via data center management. One promising approach is to switch some servers in a data center to the idle mode for saving energy while to keep a suitable number of servers in the active mode for providing timely service. In this paper, we design both online and offline algorithms for this problem. For the offline algorithm, we formulate data center management as a cost minimization problem by considering energy cost, delay cost (to measure service quality), and switching cost (to change serversâ&#x20AC;&#x2122;s active/idle mode). Then, we analyze certain properties of an optimal solution which lead to a dynamic programming based algorithm. Moreover, by revising the solution procedure, we successfully eliminate the recursive procedure and achieve an optimal offline algorithm with a polynomial complexity. For the online algorithm, we design it by considering the worst case scenario for future workload. In simulation, we show this online algorithm can always provide near-optimal solutions.

ETPL PDS - 008

Minimum Dependencies Energy-Efficient Scheduling in Data Centers

This work presents an on-line, energy- and communication-aware scheduling strategy for SaaS applications in data centers. The applications are composed of various services and represented as workflows. Each workflow consists of tasks related to each other by precedence constraints and represented by Directed Acyclic Graphs (DAGs). The proposed scheduling strategy combines advantages of state-of-the-art workflow scheduling strategies with energy-aware independent task scheduling approaches. The process of scheduling consists of two phases. In the first phase, virtual deadlines of individual tasks are set in the central scheduler. These deadlines are determined using a novel strategy that favors tasks which are less dependent on other tasks. During the second phase, tasks are dynamically assigned to computing servers based on the current load of network links and servers in a data center. The proposed approach, called Minimum Dependencies Energy-efficient DAG (MinD+ED) scheduling, has been implemented in the GreenCloud simulator. It outperforms other approaches in terms of energy efficiency, while keeping a satisfiable level of tardiness.

ETPL PDS - 009

LazyCtrl: A Scalable Hybrid Network Control Plane Design for Cloud Data Centers

The advent of software defined networking enables flexible, reliable and feature-rich control planes for data center networks. However, the tight coupling of centralized control and complete visibility leads to a wide range of issues among which scalability has risen to prominence due to the excessive workload on the central controller. By analyzing the traffic patterns from a couple of production data centers, we observe that data center traffic is usually highly skewed and thus edge switches can be clustered into a set of communicationintensive groups according to traffic locality. Motivated by this observation, we present LazyCtrl, a novel hybrid control plane design for data center networks where network control is carried out by distributed control mechanisms inside independent groups of switches while complemented with a global controller. LazyCtrl aims at bringing laziness to the global controller by dynamically devolving most of the control tasks to independent switch groups to process frequent intra-group events near the datapath while handling rare inter-group or other specified events by the controller. We implement LazyCtrl and build a prototype based on Open vSwitch and Floodlight. Tracedriven experiments on our prototype show that an effective switch grouping is easy to maintain in multi-tenant clouds and the central controller can be significantly shielded by staying â&#x20AC;&#x153;lazyâ&#x20AC;?, with its workload reduced by up to 82%.

ETPL PDS - 010

Trajectory Pattern Mining for Urban Computing in the Cloud

The increasing pervasiveness of mobile devices along with the use of technologies like GPS, Wifi networks, RFID, and sensors, allows for the collections of large amounts of movement data. This amount of data can be analyzed to extract descriptive and predictive models that can be properly exploited to improve urban life. From a technological viewpoint, Cloud computing can play an essential role by helping city administrators to quickly acquire new capabilities and reducing initial capital costs by means of a comprehensive pay-as-you-go solution. This paper presents a workflowbased parallel approach for discovering patterns and rules from trajectory data, in a Cloud-based framework. Experimental evaluation has been carried out on both real-world and synthetic trajectory data, up to one million of trajectories. The results show that, due to the high complexity and large volumes of data involved in the application scenario, the trajectory pattern mining process takes advantage from the scalable execution environment offered by a Cloud architecture in terms of both execution time, speed-up and scale-up.

ETPL PDS - 011

Time Series-Oriented Load Prediction Model and Migration Policies for Distributed Simulation Systems

HLA-based simulation systems are prone to load imbalances due to lack management of shared resources in distributed environments. Such imbalances lead these simulations to exhibit performance loss in terms of execution time. As a result, many dynamic load balancing systems have been introduced to manage distributed load. These systems use specific methods, depending on load or application characteristics, to perform the required balancing. Load prediction is a technique that has been used extensively to enhance load redistribution heuristics towards preventing load imbalances. In this paper, several efficient Time Series model variants are presented and used to enhance prediction precision for large-scale distributed simulation-based systems. These variants are proposed to extend and correct the issues originating from the implementation of Holt’s model for time series in the predictive module of a dynamic load balancing system for HLA-based distributed simulations. A set of migration decision-making techniques is also proposed to enable a prediction-based load balancing system to be independent of any prediction model, promoting a more modular construction.

ETPL PDS - 012

Enabling Parallel Simulation of Large-Scale HPC Network Systems

With the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks used in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discreteevent simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations.

ETPL PDS - 013

PacketCloud: A Cloudlet-Based Open Platform for In-Network Services

The Internet was designed with the end-to-end principle where the network layer provided merely the best-effort forwarding service. This design makes it challenging to add new services into the Internet infrastructure. However, as the Internet connectivity becomes a commodity, users and applications increasingly demand new in-network services. This paper proposes PacketCloud, a cloudlet-based open platform to host in-network services. Different from standalone, specialized middleboxes, cloudlets can efficiently share a set of commodity servers among different services, and serve the network traffic in an elastic way. PacketCloud can help both Internet Service Providers (ISPs) and emerging application/content providers deploy their services at strategic network locations. We have implemented a proof-of-concept prototype of PacketCloud. PacketCloud introduces a small additional delay, and can scale well to handle high-throughput data traffic. We have evaluated PacketCloud in both a fully functional emulated environment, and the real Internet.

ETPL PDS - 014

Towards Practical and Near-optimal Coflow Scheduling for Data Center Networks

In current data centers, an application (e.g., MapReduce, Dryad, search platform, etc.) usually generates a group of parallel flows to complete a job. These flows compose a coflow and only completing them all is meaningful to the application. Accordingly, minimizing the average Coflow Completion Time (CCT) becomes a critical objective of flow scheduling. However, achieving this goal in todayâ&#x20AC;&#x2122;s Data Center Networks (DCNs) is quite challenging, not only because the schedule problem is theoretically NP-hard, but also because it is tough to perform practical flow scheduling in large-scale DCNs. In this paper, we find that minimizing the average CCT of a set of coflows is equivalent to the well-known problem of minimizing the sum of completion times in a concurrent open shop. As there are abundant existing solutions for concurrent open shop, we open up a variety of techniques for coflow scheduling. Inspired by the best known result, we derive a 2-approximation algorithm for coflow scheduling, and further develop a decentralized coflow scheduling system, D-CAS, which avoids the system problems associated with current centralized proposals while addressing the performance challenges of decentralized suggestions. Trace-driven simulations indicate that D-CAS achieves a performance close to Varys, the state-of-the-art centralized method, and outperforms Baraat, the only existing decentralized method, significantly.

ETPL PDS - 015

A Survey of Task Allocation and Load Balancing in Distributed Systems

In past decades, significant attention has been devoted to the task allocation and load balancing in distributed systems. Although there have been some related surveys about this subject, each of which only made a very preliminary review on the state of art of one single type of distributed systems. To correlate the studies in varying types of distributed systems and make a comprehensive taxonomy on them, this survey mainly categorizes and reviews the representative studies on task allocation and load balancing according to the general characteristics of varying distributed systems. First, this survey summarizes the general characteristics of distributed systems. Based on these general characteristics, this survey reviews the studies on task allocation and load balancing with respect to the following aspects: 1) typical control models; 2) typical resource optimization methods; 3) typical methods for achieving reliability; 4) typical coordination mechanisms among heterogeneous nodes; and 5) typical models considering network structures. For each aspect, we summarize the existing studies and discuss the future research directions. Through the survey, the related studies in this area can be well understood based on how they can satisfy the general characteristics of distributed systems.

ETPL PDS - 016

An Efficient Privacy-Preserving Ranked Keyword Search Method

Cloud data owners prefer to outsource documents in an encrypted form for the purpose of privacy preserving. Therefore it is essential to develop efficient and reliable ciphertext search techniques. One challenge is that the relationship between documents will be normally concealed in the process of encryption, which will lead to significant search accuracy performance degradation. Also the volume of data in data centers has experienced a dramatic growth. This will make it even more challenging to design ciphertext search schemes that can provide efficient and reliable online information retrieval on large volume of encrypted data. In this paper, a hierarchical clustering method is proposed to support more search semantics and also to meet the demand for fast ciphertext search within a big data environment. The proposed hierarchical approach clusters the documents based on the minimum relevance threshold, and then partitions the resulting clusters into sub-clusters until the constraint on the maximum size of cluster is reached. In the search phase, this approach can reach a linear computational complexity against an exponential size increase of document collection. In order to verify the authenticity of search results, a structure called minimum hash sub-tree is designed in this paper. Experiments have been conducted using the collection set built from the IEEE Xplore. The results show that with a sharp increase of documents in the dataset the search time of the proposed method increases linearly whereas the search time of the traditional method increases exponentially. Furthermore, the proposed method has an advantage over the traditional method in the rank privacy and relevance of retrieved documents.

ETPL PDS - 017

Cost Minimization for Rule Caching in Software Defined Networking

Software-defined networking (SDN) is an emerging network paradigm that simplifies network management by decoupling the control plane and data plane, such that switches become simple data forwarding devices and network management is controlled by logically centralized servers. In SDNenabled networks, network flow is managed by a set of associated rules that are maintained by switches in their local Ternary Content Addressable Memories (TCAMs) which support high-speed parallel lookup on wildcard patterns. Since TCAM is an expensive hardware and extremely power-hungry, each switch has only limited TCAM space and it is inefficient and even infeasible to maintain all rules at local switches. On the other hand, if we eliminate TCAM occupation by forwarding all packets to the centralized controller for processing, it results in a long delay and heavy processing burden on the controller. In this paper, we strive for the fine balance between rule caching and remote packet processing by formulating a minimum weighted flow provisioning ( MWFP) problem with an objective of minimizing the total cost of TCAM occupation and remote packet processing. We propose an efficient offline algorithm if the network traffic is given, otherwise, we propose two online algorithms with guaranteed competitive ratios. Finally, we conduct extensive experiments by simulations using real network traffic traces. The simulation results demonstrate that our proposed algorithms can significantly reduce the total cost of remote controller processing and TCAM occupation, and the solutions obtained are nearly optimal.

ETPL PDS - 018

A Hop-by-Hop Routing Mechanism for Green Internet

In this paper we study energy conservation in the Internet. We observe that different traffic volumes on a link can result in different energy consumption; this is mainly due to such technologies as trunking (IEEE 802.1AX), adaptive link rates, etc. We design a green Internet routing scheme, where the routing can lead traffic in a way that is green. We differ from previous studies where they switch network components, such as line cards and routers, into sleep mode. We do not prune the Internet topology. We first develop a power model, and validate it using real commercial routers. Instead of developing a centralized optimization algorithm, which requires additional protocols such as MPLS to materialize in the Internet, we choose a hop-by-hop approach. It is thus much easier to integrate our scheme into the current Internet. We progressively develop three algorithms, which are loop-free, substantially reduce energy consumption, and jointly consider green and QoS requirements such as path stretch. We further analyze the power saving ratio, the routing dynamics, and the relationship between hop-by-hop green routing and QoS requirements. We comprehensively evaluate our algorithms through simulations on synthetic, measured, and real topologies, with synthetic and real traffic traces. We show that the power saving in the line cards can be as much as 50 percent.

ETPL PDS - 019

Multi-Jagged: A Scalable Parallel Spatial Partitioning Algorithm

Geometric partitioning is fast and effective for load-balancing dynamic applications, particularly those requiring geometric locality of data (particle methods, crash simulations). We present, to our knowledge, the first parallel implementation of a multidimensional-jagged geometric partitioner. In contrast to the traditional recursive coordinate bisection algorithm (RCB), which recursively bisects subdomains perpendicular to their longest dimension until the desired number of parts is obtained, our algorithm does recursive multi-section with a given number of parts in each dimension. By computing multiple cut lines concurrently and intelligently deciding when to migrate data while computing the partition, we minimize data movement compared to efficient implementations of recursive bisection. We demonstrate the algorithm's scalability and quality relative to the RCB implementation in Zoltan on both real and synthetic datasets. Our experiments show that the proposed algorithm performs and scales better than RCB in terms of run-time without degrading the load balance. Our implementation partitions 24 billion points into 65,536 parts within a few seconds and exhibits near perfect weak scaling up to 6K cores.

ETPL PDS - 020

Workflow Scheduling in Multi-Tenant Cloud Computing Environments

Multi-tenancy is one of the key features of cloud computing, which provides scalability and economic benefits to the end-users and service providers by sharing the same cloud platform and its underlying infrastructure with the isolation of shared network and compute resources. However, resource management in the context of multi-tenant cloud computing is becoming one of the most complex task due to the inherent heterogeneity and resource isolation. This paper proposes a novel cloud-based workflow scheduling (CWSA) policy for compute-intensive workflow applications in multi-tenant cloud computing environments, which helps minimize the overall workflow completion time, tardiness, cost of execution of the workflows, and utilize idle resources of cloud effectively. The proposed algorithm is compared with the state-of-the-art algorithms, i.e., First Come First Served (FCFS), EASY Backfilling, and Minimum Completion Time (MCT) scheduling policies to evaluate the performance. Further, a proof-of-concept experiment of real-world scientific workflow applications is performed to demonstrate the scalability of the CWSA, which verifies the effectiveness of the proposed solution. The simulation results show that the proposed scheduling policy improves the workflow performance and outperforms the aforementioned alternative scheduling policies under typical deployment scenarios.

ETPL PDS - 021

Dynamic Bin Packing for On-Demand Cloud Resource Allocation

Dynamic Bin Packing (DBP) is a variant of classical bin packing, which assumes that items may arrive and depart at arbitrary times. Existing works on DBP generally aim to minimize the maximum number of bins ever used in the packing. In this paper, we consider a new version of the DBP problem, namely, the MinTotal DBP problem which targets at minimizing the total cost of the bins used overtime. It is motivated by the request dispatching problem arising from cloud gaming systems. We analyze the competitive ratios of the modified versions of the commonly used First Fit, Best Fit, and Any Fit packing (the family of packing algorithms that open a new bin only when no currently open bin can accommodate the item to be packed) algorithms for the MinTotal DBP problem. We show that the competitive ratio of Any Fit packing cannot be better than μ + 1, where μ is the ratio of the maximum item duration to the minimum item duration. The competitive ratio of Best Fit packing is not bounded for any given μ. For First Fit packing, if all the item sizes are smaller than 1/β of the bin capacity (β> 1 is a constant), the competitive ratio has an upper bound of β/β-1·μ+3β/β-1 + 1. For the general case, the competitive ratio of First Fit packing has an upper bound of 2μ + 7. We also propose a Hybrid First Fit packing algorithm that can achieve a competitive ratio no larger than 5/4 μ + 19/4 when μ is not known and can achieve a competitive ratio no larger than μ + 5 when μ is known.

ETPL PDS - 022

Deadline Guaranteed Service for Multi-Tenant Cloud Storage

It is imperative for cloud storage systems to be able to provide deadline guaranteed services according to service level agreements (SLAs) for online services. In spite of many previous works on deadline aware solutions, most of them focus on scheduling work flows or resource reservation in datacenter networks but neglect the server overload problem in cloud storage systems that prevents providing the deadline guaranteed services. In this paper, we introduce a new form of SLAs, which enables each tenant to specify a percentage of its requests it wishes to serve within a specified deadline. We first identify the multiple objectives (i.e., traffic and latency minimization, resource utilization maximization) in developing schemes to satisfy the SLAs. To satisfy the SLAs while achieving the multi-objectives, we propose a Parallel Deadline Guaranteed (PDG) scheme, which schedules data reallocation (through load re-assignment and data replication) using a tree-based bottom-up parallel process. The observation from our model also motivates our deadline strictness clustered data allocation algorithm that maps tenants with the similar SLA strictness into the same server to enhance SLA guarantees. We further enhance PDG in supplying SLA guaranteed services through two algorithms: i) a prioritized data reallocation algorithm that deals with request arrival rate variation, and ii) an adaptive request retransmission algorithm that deals with SLA requirement variation. Our tracedriven experiments on a simulator and Amazon EC2 show the effectiveness of our schemes for guaranteeing the SLAs while achieving the multi-objectives.

ETPL PDS - 023

RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration

Hadoop is a widely-used implementation framework of the MapReduce programming model for largescale data processing. Hadoop performance however is significantly affected by the settings of the Hadoop configuration parameters. Unfortunately, manually tuning these parameters is very timeconsuming, if at all practical. This paper proposes an approach, called RFHOC, to automatically tune the Hadoop configuration parameters for optimized performance for a given application running on a given cluster. RFHOC constructs two ensembles of performance models using a random-forest approach for the map and reduce stage respectively. Leveraging these models, RFHOC employs a genetic algorithm to automatically search the Hadoop configuration space. The evaluation of RFHOC using five typical Hadoop programs, each with five different input data sets, shows that it achieves a performance speedup by a factor of 2.11 $times$ on average and up to 7.4 $times$ over the recently proposed cost-based optimization (CBO) approach. In addition, RFHOC's performance benefit increases with input data set size.

ETPL PDS - 024

Online Resource Scheduling Under Concave Pricing for Cloud Computing

With the booming cloud computing industry, computational resources are readily and elastically available to the customers. In order to attract customers with various demands, most Infrastructure-asa-service (IaaS) cloud service providers offer several pricing strategies such as pay as you go, pay less per unit when you use more (so called volume discount), and pay even less when you reserve. The diverse pricing schemes among different IaaS service providers or even in the same provider form a complex economic landscape that nurtures the market of cloud brokers. By strategically scheduling multiple customers' resource requests, a cloud broker can fully take advantage of the discounts offered by cloud service providers. In this paper, we focus on how a broker can help a group of customers to fully utilize the volume discount pricing strategy offered by cloud service providers through costefficient online resource scheduling. We present a randomized online stack-centric scheduling algorithm (ROSA) and theoretically prove the lower bound of its competitive ratio. Three special cases of the offline concave cost scheduling problem and the corresponding optimal algorithms are introduced. Our simulation shows that ROSA achieves a competitive ratio close to the theoretical lower bound under the special cases. Trace-driven simulation using Google cluster data demonstrates that ROSA is superior to the conventional online scheduling algorithms in terms of cost saving.

ETPL PDS - 025

Circuit Ciphertext-Policy Attribute-Based Hybrid Encryption with Verifiable Delegation in Cloud Computing

In the cloud, for achieving access control and keeping data confidential, the data owners could adopt attribute-based encryption to encrypt the stored data. Users with limited computing power are however more likely to delegate the mask of the decryption task to the cloud servers to reduce the computing cost. As a result, attribute-based encryption with delegation emerges. Still, there are caveats and questions remaining in the previous relevant works. For instance, during the delegation, the cloud servers could tamper or replace the delegated ciphertext and respond a forged computing result with malicious intent. They may also cheat the eligible users by responding them that they are ineligible for the purpose of cost saving. Furthermore, during the encryption, the access policies may not be flexible enough as well. Since policy for general circuits enables to achieve the strongest form of access control, a construction for realizing circuit ciphertext-policy attribute-based hybrid encryption with verifiable delegation has been considered in our work. In such a system, combined with verifiable computation and encrypt-then-mac mechanism, the data confidentiality, the fine-grained access control and the correctness of the delegated computing results are well guaranteed at the same time. Besides, our scheme achieves security against chosen-plaintext attacks under the k-multilinear Decisional DiffieHellman assumption. Moreover, an extensive simulation campaign confirms the feasibility and efficiency of the proposed solution.

ETPL PDS - 026

Evolutionary Multi-Objective Workflow Scheduling in Cloud

Cloud computing provides promising platforms for executing large applications with enormous computational resources to offer on demand. In a Cloud model, users are charged based on their usage of resources and the required quality of service (QoS) specifications. Although there are many existing workflow scheduling algorithms in traditional distributed or heterogeneous computing environments, they have difficulties in being directly applied to the Cloud environments since Cloud differs from traditional heterogeneous environments by its service-based resource managing method and pay-peruse pricing strategies. In this paper, we highlight such difficulties, and model the workflow scheduling problem which optimizes both makespan and cost as a Multi-objective Optimization Problem (MOP) for the Cloud environments. We propose an evolutionary multi-objective optimization (EMO)-based algorithm to solve this workflow scheduling problem on an infrastructure as a service (IaaS) platform. Novel schemes for problem-specific encoding and population initialization, fitness evaluation and genetic operators are proposed in this algorithm. Extensive experiments on real world workflows and randomly generated workflows show that the schedules produced by our evolutionary algorithm present more stability on most of the workflows with the instance-based IaaS computing and pricing models. The results also show that our algorithm can achieve significantly better solutions than existing state-of-the-art QoS optimization scheduling algorithms in most cases. The conducted experiments are based on the on-demand instance types of Amazon EC2; however, the proposed algorithm are easy to be extended to the resources and pricing models of other IaaS services.

ETPL PDS - 027

Coral: A Cloud-Backed Frugal File System

With simple access interfaces and flexible billing models, cloud storage has become an attractive solution to simplify the storage management for both enterprises and individual users. However, traditional file systems with extensive optimizations for local disk-based storage backend can not fully exploit the inherent features of the cloud to obtain desirable performance. In this paper, we present the design, implementation, and evaluation of Coral, a cloud based file system that strikes a balance between performance and monetary cost. Unlike previous studies that treat cloud storage as just a normal backend of existing networked file systems, Coral is designed to address several key issues in optimizing cloud-based file systems such as the data layout, block management, and billing model. With carefully designed data structures and algorithms, such as identifying semantically correlated data blocks, kd-tree based caching policy with self-adaptive thrashing prevention, effective data layout, and optimal garbage collection, Coral achieves good performance and cost savings under various workloads as demonstrated by extensive evaluations.

ETPL PDS - 028

Poris: A Scheduler for Parallel Soft Real-Time Applications in Virtualized Environments

With the prevalence of cloud computing and virtualization, more and more cloud services including parallel soft real-time applications (PSRT applications) are running in virtualized data centers. However, current hypervisors do not provide adequate support for them because of soft real-time constraints and synchronization problems, which result in frequent deadline misses and serious performance degradation. CPU schedulers in underlying hypervisors are central to these issues. In this paper, we identify and analyze CPU scheduling problems in hypervisors. Then, we design and implement a parallel soft real-time scheduler according to the analysis, named Poris, based on Xen. It addresses both soft real-time constraints and synchronization problems simultaneously. In our proposed method, priority promotion and dynamic time slice mechanisms are introduced to determine when to schedule virtual CPUs (VCPUs) according to the characteristics of soft real-time applications. Besides, considering that PSRT applications may run in a virtual machine (VM) or multiple VMs, we present parallel scheduling, group scheduling and communication-driven group scheduling to accelerate synchronizations of these applications and make sure that tasks are finished before their deadlines under different scenarios. Our evaluation shows Poris can significantly improve the performance of PSRT applications no matter how they run in a VM or multiple VMs. For example, compared to the Credit scheduler, Poris decreases the response time of web search benchmark by up to 91.6 percent.

ETPL PDS - 029

FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters

Traditional parallel algorithms for mining frequent itemsets aim to balance load by equally partitioning data among a group of computing nodes. We start this study by discovering a serious performance problem of the existing parallel Frequent Itemset Mining algorithms. Given a large dataset, data partitioning strategies in the existing solutions suffer high communication and mining overhead induced by redundant transactions transmitted among computing nodes. We address this problem by developing a data partitioning approach called FiDoop-DP using the MapReduce programming model. The overarching goal of FiDoop-DP is to boost the performance of parallel Frequent Itemset Mining on Hadoop clusters. At the heart of FiDoop-DP is the Voronoi diagram-based data partitioning technique, which exploits correlations among transactions. Incorporating the similarity metric and the Locality-Sensitive Hashing technique, FiDoop-DP places highly similar transactions into a data partition to improve locality without creating an excessive number of redundant transactions. We implement FiDoop-DP on a 24-node Hadoop cluster, driven by a wide range of datasets created by IBM Quest Market-Basket Synthetic Data Generator. Experimental results reveal that FiDoop-DP is conducive to reducing network and computing loads by the virtue of eliminating redundant transactions on Hadoop nodes. FiDoop-DP significantly improves the performance of the existing parallel frequentpattern scheme by up to 31% with an average of 18%.

ETPL PDS - 030

Trends and Directions in Cloud Service Selection

With the growing popularity of cloud computing the number of cloud service providers and services have significantly increased. Thus selecting the best cloud services becomes a challenging task for prospective cloud users. The process of selecting cloud services involves various factors such as characteristics and models of cloud services, user requirements and knowledge, and service level agreement (SLA), to name a few. This paper investigates into the cloud service selection tools, techniques and models by taking into account the distinguishing characteristics of cloud services. It also reviews and analyses academic research as well as commercial tools in order to identify their strengths and weaknesses in the cloud services selection process. It proposes a framework in order to improve the cloud service selection by taking into account services capabilities, quality attributes, level of user's knowledge and service level agreements. The paper also envisions various directions for future research.

ETPL PDS - 031

Placement and Performance Analysis of Virtual Multicast Networks in Fat-Tree Data Center Networks

Virtualization of servers and networks is a key technique to resolve the conflict between the increasing demands on computing power and the high cost of hardware in data centers. In order to map virtual networks to physical infrastructure efficiently, designers have to make careful decisions on the allocation of limited resources, which makes placement of virtual networks in data centers a critical issue. In this paper, we study the placement of virtual networks in fat-tree data center networks. In order to meet the requirements of instant parallel data transfer between multiple computing units, we propose a model of multicast-capable virtual networks (MVNs). We then design four virtual machine (VM) placement schemes to embed MVNs into fat-tree data center networks, named Most-Vacant- Fit (MVF), Most-Compact-First (MCF), Mixed-Bidirectional-Fill (MBF), and Malleable-Shallow-Fill (MSF). All these VM placement schemes guarantee the nonblocking multicast capability of each MVN while simultaneously achieving significant saving in the cost of network hardware. In addition, each VM placement scheme has its unique features. The MVF scheme has zero interference to existing computing tasks in data centers; the MCF scheme leads to the greatest cost saving; the MBF scheme simultaneously possesses the merits of MVF and MCF, and it provides an adjustable parameter allowing cloud providers to achieve preferred balance between the cost and the overhead; the MSF scheme performs at least as well as MBF, and possesses some additional predictable features. Finally, we compare the performance and overhead of these VM placement schemes, and present simulation results to validate the theoretical results.