“ De ARM cloud-server Appendices 1 t/m 5, o.a. uitleg over toepassingen”
Cy7/Cy21 ARM Cloud Servers Benelux Introduction A Compatibles2 Paper Appendix 1-5 ARM Cloud Servers
door Hans Noort compatibles2@gmail.com
Q1 2015 ver 1.3
Contents Executive Summary ................................................................................................................. 3 APPENDIX 1 The “What Is?” Section, 1844 TB Storage, hoeveel is dat? (NL) .................. 4 APPENDIX 2 The “What Is?” Section, GlusterFS, wat is dat? (UK) .................................... 5 Advantages to GlusterFS ................................................................................................... 5 What makes Gluster outstanding among other distributed file systems? ................... 5 APPENDIX 2 The “What Is?” Section, GlusterFS, wat is dat? (1) ....................................... 6 Storage concepts in GlusterFS ......................................................................................... 6 APPENDIX 2 The “What Is?” Section, GlusterFS, wat is dat? (2) ....................................... 7 APPENDIX 3 The “What Is?” Section, Ceph, wat is dat? (UK) ............................................ 8 APPENDIX 3 The “What Is?” Section, Ceph, wat is dat? (1) ............................................... 9 APPENDIX 3 The “What Is?” Section, Ceph, wat is dat? (2) .............................................10 APPENDIX 4 The “What Is?” Section, Apache Hadoop wat is dat? (UK).........................11 APPENDIX 5 The “What Is?” Section, OpenStack wat is dat? (UK) .................................12
Q1 2015 ver. 1.3 Appendix 1-5
2
Compatibles2 Cloud Servers
Section
1
Executive Summary Om tegemoet te komen aan de groeiende behoefte aan Groene IT oplossingen, in combinatie met Cloud Computing, introduceren wij extreem zuinige Cloud Servers. Met de volstrekt unieke CyOne, Cy7 en Cy21 ARM based servers, met als bijzondere kwaliteit oneindige schaalbaarheid, bieden wij revolutionaire maar zeer betaalbare oplossingen. Server oplossingen op basis van ARM CPU’s, voor o.a. Hostingbedrijven, Datacenters en in-house ICT klanten, die momenteel beperkingen ondervinden op het gebied van energieverbruik en vloerruimte en tevens de kosten willen reduceren. Maar denk ook aan Cloud Test-omgevingen, Research, Educatie, Data Analyse en specifieke nieuwe toepassingen, zoals nieuwe vormen van Social Media. Met een server density van 532 micro servers en 456 HDD en 76 SSD’s = 1844 TB in een Server rack en toch maar een een stroomverbruik van max. 7.04 KWatt bij volledige belasting vormen ARM-based Cloud Servers de meest innovatieve ICT-oplossingen die ook nog eens direct energie besparingen opleveren tot 50% t.o.v. Intel gebaseerde servers. In deze afzonderlijk APPENDIX Bijlagen 1 t/m 5 geven wij uitleg over veel gebruikte Cloud Server termen en software. (Vrijwel volledig technische inhoud) In Part One gaan wij in op het ontstaan en de ontwikkelingen rond de Cloud, explosieve data groei en de hiervoor noodzakelijk innovatie op cloudserver gebied, om de groeiende data behoefte bij te kunnen benen. (Niet technische inhoud) In Part Two gaan wij nader in op de oplossingen die ARM based servers geven op de vraag naar schaalbare computing power en storage capaciteit, tegen lagere kosten in aanschaf, minder rackspace, minder onderhoud, halvering van de energiekosten en reductie van de benodigde koelingscapaciteit. (Deels technische inhoud)
Q1 2015 ver. 1.3 Appendix 1-5
3
Compatibles2 Cloud Servers
Appendices APPENDIX 1 The “What Is?” Section, 1844 TB Storage, hoeveel is dat? (NL) Processor or Virtual Storage
Disk Storage
· · · · · · · · · · · ·
· · · · · · · · · · · ·
1 Bit = Binary Digit 8 Bits = 1 Byte 1024 Bytes = 1 Kilobyte 1024 Kilobytes = 1 Megabyte 1024 Megabytes = 1 Gigabyte 1024 Gigabytes = 1 Terabyte 1024 Terabytes = 1 Petabyte 1024 Petabytes = 1 Exabyte 1024 Exabytes = 1 Zettabyte 1024 Zettabytes = 1 Yottabyte 1024 Yottabytes = 1 Brontobyte 1024 Brontobytes = 1 Geopbyte
1 Bit = Binary Digit 8 Bits = 1 Byte 1000 Bytes = 1 Kilobyte 1000 Kilobytes = 1 Megabyte 1000 Megabytes = 1 Gigabyte 1000 Gigabytes = 1 Terabyte 1000 Terabytes = 1 Petabyte 1000 Petabytes = 1 Exabyte 1000 Exabytes = 1 Zettabyte 1000 Zettabytes = 1 Yottabyte 1000 Yottabytes = 1 Brontobyte 1000 Brontobytes = 1 Geopbyte
Bijna 2 Petabyte! Maar hoeveel data is dat? Een Terabyte is ongeveer 1,000 Gigabytes. Het is nog niet zo heel lang geleden dat het ondenkbaar was dat er ooit 1 Terabyte mechanische harddisks te koop zouden zijn, maar momenteel is een capaciteit van 4 tot 8 TB drives heel gewoon om thuis in je PC te hebben en een 1 TB SSD gewoon verkrijgbaar. 1 TB is ruwweg de opslagcapaciteit van 300 uur video, 1000 volledige Encyclopedien of 3,6 miljoen medium resolutie foto’s. Een Petabyte is ongeveer 1,000 Terabytes (!) en daarmee al een vrijwel niet voor te stellen hoeveelheid data. 1 Petabyte zou de informatie kunnen bevatten van 20 miljoen 4-ladige archiefkasten vol met tekst of anders gezegd 500 miljard volgeschreven blaadjes tekst.
Q1 2015 ver. 1.3 Appendix 1-5
4
Compatibles2 Cloud Servers
Appendices APPENDIX 2 The “What Is?” Section, GlusterFS, wat is dat? (UK) GlusterFS is a distributed file system defined to be used in user space, i.e. File System in User Space (FUSE). It is a software based file system which accounts to its own flexibility feature. Look at the following figure which schematically represents the position of GlusterFS in a hierarchical model. By default TCP protocol will be used by GlusterFS.
GlusterFS Design Advantages to GlusterFS
Innovation – It eliminates the metadata and can dramatically improve the performance which will help us to unify data and objects. Elasticity – Adapted to growth and reduction of size of the data. Scale Linearly – It has availability to petabytes and beyond. Simplicity – It is easy to manage and independent from kernel while running in user space. What makes Gluster outstanding among other distributed file systems?
Salable – Absence of a metadata server provides a faster file system. Affordable – It deploys on commodity hardware. Flexible – As I said earlier, GlusterFS is a software only file system. Here data is stored on native file systems like ext4, xfs etc. Open Source – Currently GlusterFS is maintained by Red Hat Inc, a billion dollar open source company, as part of Red Hat Storage.
Q1 2015 ver. 1.3 Appendix 1-5
5
Compatibles2 Cloud Servers
Appendices APPENDIX 2 The “What Is?” Section, GlusterFS, wat is dat? (1) Storage concepts in GlusterFS
Brick – Brick is basically any directory that is meant to be shared among the trusted storage pool. Trusted Storage Pool – is a collection of these shared files/directories, which are based on the designed protocol. Block Storage – They are devices through which the data is being moved across systems in the form of blocks. Cluster – In Red Hat Storage, both cluster and trusted storage pool convey the same meaning of collaboration of storage servers based on a defined protocol. Distributed File System – A file system in which data is spread over different nodes where users can access the file without knowing the actual location of the file. User doesn’t experience the feel of remote access. FUSE – It is a loadable kernel module which allows users to create file systems above kernel without involving any of the kernel code. glusterd – glusterd is the GlusterFS management daemon which is the backbone of file system which will be running throughout the whole time whenever the servers are in active state. POSIX – Portable Operating System Interface (POSIX) is the family of standards defined by the IEEE as a solution to the compatibility between Unix-variants in the form of an Application Programmable Interface (API). RAID – Redundant Array of Independent Disks (RAID) is a technology that gives increased storage reliability through redundancy. Subvolume – A brick after being processed by least at one translator. Translator – A translator is that piece of code which performs the basic actions initiated by the user from the mount point. It connects one or more sub volumes. Volume – A volumes is a logical collection of bricks. All the operations are based on the different types of volumes created by the user.
Q1 2015 ver. 1.3 Appendix 1-5
6
Compatibles2 Cloud Servers
Appendices APPENDIX 2 The “What Is?� Section, GlusterFS, wat is dat? (2) Different Types of Volumes
Representations of different types of volumes and combinations among these basic volume types are also allowed as shown below.
Q1 2015 ver. 1.3 Appendix 1-5
7
Compatibles2 Cloud Servers
Appendices APPENDIX 3 The “What Is?� Section, Ceph, wat is dat? (UK) Ceph is a free software storage platform designed to present object, block, and file storage from a single distributed computer cluster. Ceph's main goals are to be completely distributed without a single point of failure, scalable to the exabyte level, and freely-available. The data is replicated, making it fault tolerant, using an algorithm called CRUSH (Controlled Replication Under Scalable Hashing) to ensure that data is evenly distributed across the cluster.
Ceph software runs on commodity hardware. The system is designed to be both self-healing and self-managing and strives to reduce both administrator and budget overhead. OSDs OSD stands for Object Storage Device, and can be considered as a physical disk. An OSD is actually a directory (eg./var/lib/ceph/osd-1) that Ceph makes use of, residing on a regular filesystem, though it should be assumed to be opaque for the purposes of using it with Ceph. A feature of Ceph is that it can tolerate the loss of OSDs. This means you can theoretically achieve fantastic utilisation of storage devices by obviating the need for RAID on every single device.
Q1 2015 ver. 1.3 Appendix 1-5
8
Compatibles2 Cloud Servers
Appendices APPENDIX 3 The “What Is?” Section, Ceph, wat is dat? (1) Placement Groups Also referred to as PGs, help ensure performance and scalability, as tracking metadata for each individual object would be too costly. A PG collects objects from the next layer up and manages them as a collection. It represents a mostlystatic mapping to one or more underlying OSDs. Replication is done at the PG layer: the degree of replication (number of copies) is asserted higher, up at the Pool level, and all PGs in a pool will replicate stored objects into multiple OSDs. Pools A pool is the layer at which most user-interaction takes place. This is the important stuff like GET, PUT, DELETE actions for objects in a pool. Pools contain a number of PGs, not shared with other pools (if you have multiple pools). The number of PGs in a pool is defined when the pool is first created, and can’t be changed later. You can think of PGs as providing a hash mapping for objects into OSDs, to ensure that the OSDs are filled evenly when adding objects to the pool. CRUSH maps CRUSH mappings are specified on a per-pool basis, and serve to skew the distribution of objects into OSDs according to administrator-defined policy. This is important for ensuring that replicas don’t end up on the same disk/host/rack/etc, which would break the entire point of having replicant copies. A CRUSH map is written by hand, then compiled and passed to the cluster. So……:
Many objects will map to one PG Each object maps to exactly one PG One PG maps to a list of OSDs. The first one in the list is the primary and the rest are replicas Many PGs can map to one OSD
A PG represents nothing but a grouping of objects; you configure the number of PGs you want, and all of your stored objects are evenly distributed to the PGs. So a PG explicitly does NOT represent a fixed amount of storage; it represents 1/pg_num ‘th of the storage you happen to have on your OSDs. Ceph services What the lower layers ultimately provide is a RADOS cluster: Reliable Autonomic Distributed Object Store. At a practical level this translates to storing opaque blobs of data (objects) in high performance shared storage. Because RADOS is fairly generic, it’s ideal for building more complex systems on top. One of these is RBD.
Q1 2015 ver. 1.3 Appendix 1-5
9
Compatibles2 Cloud Servers
Appendices APPENDIX 3 The “What Is?” Section, Ceph, wat is dat? (2) RBD As the name suggests, a RADOS Block Device (RBD) is a block device stored in RADOS. RBD offers useful features on top of raw RADOS objects. From the official docs: RBDs are striped over multiple PGs for performance
RBDs are resizable
Thin provisioning means on-disk space isn’t used until actually required
RBD also takes advantage of RADOS capabilities such as snapshotting and cloning CephFS CephFS is a POSIX-compliant clustered filesystem implemented on top of RADOS. This is very elegant because the lower layer features of the stack provide really awesome filesystem features (such as snapshotting), while the CephFS layer just needs to translate that into a usable filesystem. CephFS isn’t considered ready for prime-time just yet, but RADOS and RBD are.
Q1 2015 ver. 1.3 Appendix 1-5
10
Compatibles2 Cloud Servers
Appendices APPENDIX 4 The “What Is?” Section, Apache Hadoop wat is dat? (UK) The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. The project includes these modules:
Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS™): A distributed file system that provides highthroughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARNbased system for parallel processing of large data sets.
Q1 2015 ver. 1.3 Appendix 1-5
11
Compatibles2 Cloud Servers
Appendices APPENDIX 5 The “What Is?” Section, OpenStack wat is dat? (UK) OpenStack is a free and open-source software cloud computing software platform. It can be described best as being a massively scalable cloud operating system. Users primarily deploy it as an infrastructure as a service (IaaS) solution. The technology consists of a series of interrelated projects that control pools of processing, storage, and networking resources throughout a data center—which users manage through a web-based dashboard, command-line tools, or a RESTful API. OpenStack.org release it under the terms of the Apache License. OpenStack began in 2010 as a joint project of Rackspace Hosting and NASA. Currently, it is managed by the OpenStack Foundation, a non-profit corporate entity established in September 2012 to promote OpenStack software and its community. More than 200 companies have joined the project, including Arista Networks, AT&T, AMD, Avaya, Canonical, Cisco, Dell, EMC, Ericsson, Go Daddy, Hewlett-Packard, IBM, Intel, Mellanox, Mirantis, NEC, NetApp, Nexenta, Oracle, Red Hat, SUSE Linux, VMware and Yahoo!
OpenStack is a global collaboration of developers and cloud computing technologists producing the ubiquitous open source cloud computing platform for public and private clouds. The project aims to deliver solutions for all types of clouds by being simple to implement, massively scalable, and feature rich. The technology consists of a series of interrelated projects delivering various components for a cloud infrastructure solution. Who uses OpenStack? Corporations, service providers, VARS, SMBs, researchers, and global data centers looking to deploy large-scale cloud deployments for private or public clouds leveraging the support and resulting technology of a global open source community. Why open matters: All of the code for OpenStack is freely available under the Apache 2.0 license. Anyone can run it, build on it, or submit changes back to the project. We strongly believe that an open development model is the only way to foster badly-needed cloud standards, remove the fear of proprietary lock-in for cloud customers, and create a large ecosystem that spans cloud providers.
Q1 2015 ver. 1.3 Appendix 1-5
12
Compatibles2 Cloud Servers