Hadoop spark online demo

Page 1

Apache Spark ●

What is it ?

How does it work ?

Benefits

Tuning

Examples

www.xoomtrainings.com

sales@xoomtrainings.com


Spark – What is it ? ●

Open Source

Alternative to Map Reduce for certain applications

A low latency cluster computing system

For very large data sets

May be 100 times faster than Map Reduce for –

Iterative algorithms

Interactive data mining

Used with Hadoop / HDFS

Released under BSD License

www.xoomtrainings.com

sales@xoomtrainings.com


Spark – How does it work ? ●

Uses in memory cluster computing

Memory access faster than disk access

Has API's written in –

Scala

Java

Python

Can be accessed from Scala and Python shells

Currently an Apache incubator project

www.xoomtrainings.com

sales@xoomtrainings.com


Spark – Benefits ●

Scales to very large clusters

Uses in memory processing for increased speed

High Level API's –

Java, Scala, Python

Low latency shell access

www.xoomtrainings.com

sales@xoomtrainings.com


Spark – Tuning ●

Bottlenecks can occur in the cluster via –

Tune data serialization method i.e. –

CPU, memory or network bandwidth Java ObjectOutputStream vs Kryo

Memory Tuning –

Use primitive types

Set JVM Flags

Store objects in serialized form i.e. ●

RDD Persistence

MEMORY_ONLY_SER

www.xoomtrainings.com

sales@xoomtrainings.com


Spark – Examples • Example from spark-project.org, Spark job in Scala. • Showing a simple text count from a system log. • •

/*** SimpleJob.scala ***/

• •

import spark.SparkContext

import SparkContext._

• •

object SimpleJob {

def main(args: Array[String]) {

val logFile = "/var/log/syslog" // Should be some file on your system

val sc = new SparkContext("local", "Simple Job", "$YOUR_SPARK_HOME",

List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))

val logData = sc.textFile(logFile, 2).cache()

val numAs = logData.filter(line => line.contains("a")).count()

val numBs = logData.filter(line => line.contains("b")).count()

println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))

}

• } •

www.xoomtrainings.com

sales@xoomtrainings.com


Contact Us ●

Feel free to contact us at

www.xoomtrainings.com

sales@xoomtrainings.com -- USA : +1-610-686-8077 or India : +91-404-018-3355

We offer IT project consultancy

We are happy to hear about your problems

You can just pay for those hours that you need

To solve your problems


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.