Apache Spark ●
What is it ?
●
How does it work ?
●
Benefits
●
Tuning
●
Examples
www.xoomtrainings.com
sales@xoomtrainings.com
Spark – What is it ? ●
Open Source
●
Alternative to Map Reduce for certain applications
●
A low latency cluster computing system
●
For very large data sets
●
May be 100 times faster than Map Reduce for –
Iterative algorithms
–
Interactive data mining
●
Used with Hadoop / HDFS
●
Released under BSD License
www.xoomtrainings.com
sales@xoomtrainings.com
Spark – How does it work ? ●
Uses in memory cluster computing
●
Memory access faster than disk access
●
Has API's written in –
Scala
–
Java
–
Python
●
Can be accessed from Scala and Python shells
●
Currently an Apache incubator project
www.xoomtrainings.com
sales@xoomtrainings.com
Spark – Benefits ●
Scales to very large clusters
●
Uses in memory processing for increased speed
●
High Level API's –
●
Java, Scala, Python
Low latency shell access
www.xoomtrainings.com
sales@xoomtrainings.com
Spark – Tuning ●
Bottlenecks can occur in the cluster via –
●
Tune data serialization method i.e. –
●
CPU, memory or network bandwidth Java ObjectOutputStream vs Kryo
Memory Tuning –
Use primitive types
–
Set JVM Flags
–
Store objects in serialized form i.e. ●
RDD Persistence
●
MEMORY_ONLY_SER
www.xoomtrainings.com
sales@xoomtrainings.com
Spark – Examples • Example from spark-project.org, Spark job in Scala. • Showing a simple text count from a system log. • •
/*** SimpleJob.scala ***/
• •
import spark.SparkContext
•
import SparkContext._
• •
object SimpleJob {
•
def main(args: Array[String]) {
•
val logFile = "/var/log/syslog" // Should be some file on your system
•
val sc = new SparkContext("local", "Simple Job", "$YOUR_SPARK_HOME",
•
List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
•
val logData = sc.textFile(logFile, 2).cache()
•
val numAs = logData.filter(line => line.contains("a")).count()
•
val numBs = logData.filter(line => line.contains("b")).count()
•
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
•
}
• } •
www.xoomtrainings.com
sales@xoomtrainings.com
Contact Us ●
Feel free to contact us at
●
–
www.xoomtrainings.com
–
sales@xoomtrainings.com -- USA : +1-610-686-8077 or India : +91-404-018-3355
●
We offer IT project consultancy
●
We are happy to hear about your problems
●
You can just pay for those hours that you need
●
To solve your problems