Scala & Spark
Scala & Spark
The following topics will be covered in our
Scala & Spark Online Training:
Copyright @ 2015 Learntek. All Rights Reserved.
2
What is Scala? ➢ Scala & spark Training – Scala is a modern multi-paradigm programming
language designed to express common programming patterns in a concise, elegant, and type-safe way. Scala, the word came from “Scalable Language”, is a hybrid functional programming language which smoothly integrates the
features of objected oriented and functional programming languages and it is compiled to run on the Java Virtual Machine. Scala has been created by Martin Odersky and released in 2003.
Why Scala? • Scala is a type-safe JVM language that incorporates both object oriented and functional programming features into an extremely concise, logical, simple and extremely powerful language. • Scala creates a “better Java” alternative by remaining its syntax very close to the Java language syntax, so that to minimize the learning difficulty. • Scala was created specifically with the goal of creating a better language, in contrast
with those restrictive, overly tedious, or frustrating features of Java.
Copyright @ 2015 Learntek. All Rights Reserved.
4
What is Spark? • Spark is a fast cluster computing technology, designed for fast computation in
Hadoop clusters. It is based on Hadoop MapReduce programming and it extends the MapReduce model to efficiently use it for more types of computations, like interactive queries and stream processing. Spark uses Hadoop in two different ways – one is storage and another one is processing. As Spark is having its own cluster management computation, it uses Hadoop for storage purpose only.
Why Spark? • Spark was introduced by Apache Software Foundation for speeding up the Hadoop software computing process. • The main feature of Spark is its in-memory cluster computing that highly increases
the speed of an application processing. • Spark is designed to cover a wide range of workloads such as batch applications,
iterative algorithms, interactive queries and streaming applications by reducing the management burden of maintaining separate tools. Copyright @ 2015 Learntek. All Rights Reserved.
6
Introduction to Scala • Scala & spark Training – Overview of Scala
• Installing Scala • Scala Basics • IDE for Scala
Copyright @ 2015 Learntek. All Rights Reserved.
7
Scala Programming • • • • • •
Variables & Methods Literals Reserved Words Operators Precedence Rules If Expression
Copyright @ 2015 Learntek. All Rights Reserved.
• • • • • •
For Expression Exception handling with Try Expression Match Expression While Loops Do-While Loops Implicit Conversion
8
Functions in Scala • • • • •
Methods First class Function Higher Order Methods Function Literal Partially Applied Function
Copyright @ 2015 Learntek. All Rights Reserved.
• • • •
Tail Recursion Closure Currying Control Abstraction
9
Traits & OOPs in Scala • Traits • Classes & Objects • Abstract Class • Access Modifiers • Functional Programming
• Scala Class Hierarchy • Package and Imports Copyright @ 2015 Learntek. All Rights Reserved.
10
Case Class & Pattern Matching • Pattern type • Pattern Guard
• Sealed Class • Option Type
• Extractor
Copyright @ 2015 Learntek. All Rights Reserved.
11
Scala Collection • Immutable And Mutable collection • Array
• Sets • Lists
• Tuples • Maps Copyright @ 2015 Learntek. All Rights Reserved.
12
Introduction to Spark • Scala & spark Training – Problems with Traditional Large-Scale Systems
• Introducing Spark • What is Spark?
Copyright @ 2015 Learntek. All Rights Reserved.
13
Spark Basics • Spark Installation • Configure HDP 2.4 (or 2.5) on local machine
• Spark Shell • Storage layers for Spark
• Overview of Spark architecture • Initialize a Spark Context and building applications Copyright @ 2015 Learntek. All Rights Reserved.
14
IDEs for Spark Applications • SBT and its overview • Intellij
• Eclipse • Resolving dependencies for Spark applications
Copyright @ 2015 Learntek. All Rights Reserved.
15
RDDs • RDD Basics • RDD transformations and Actions
• Lazy evaluation • Element wise transformations
Copyright @ 2015 Learntek. All Rights Reserved.
16
Pair RDDs • • • •
Key-Value Pair RDD Creating Pair RDDs Transformations on Pair RDD Grouping , Joining, Sorting on Pair RDD • Data Partitioning
Copyright @ 2015 Learntek. All Rights Reserved.
• Determining a partition of Pair RDD • Operations that Benefit from Partitioning • Operations those affect the partitioning • Page Rank Example
17
Advance concepts in Spark • Accumulator
• Broadcast • Working on per-partition basis
Copyright @ 2015 Learntek. All Rights Reserved.
18
Launching Spark on cluster • Configure and launch Spark Cluster on AWS • Configure and launch Spark Cluster on Microsoft Azure
Copyright @ 2015 Learntek. All Rights Reserved.
19
Running Spark on Cluster • • • •
Spark Runtime Architecture Driver Executor Cluster Manager
Copyright @ 2015 Learntek. All Rights Reserved.
• Components of Execution : Job, Stage and Task • Spark Web URL • Driver and Executor logs • Spark-submit command
20
Caching and Persistence • RDD Lineage • Caching Overview
• Distributed Persistence
Copyright @ 2015 Learntek. All Rights Reserved.
21
Spark Algorithms • Spark SQL • Spark Streaming
• MLlib • GraphX
Copyright @ 2015 Learntek. All Rights Reserved.
22
Copyright @ 2015 Learntek. All Rights Reserved.
23