Scala & spark Online Training

Page 1

Scala & Spark


Scala & Spark

The following topics will be covered in our

Scala & Spark Online Training:

Copyright @ 2015 Learntek. All Rights Reserved.

2


What is Scala? ➢ Scala & spark Training – Scala is a modern multi-paradigm programming

language designed to express common programming patterns in a concise, elegant, and type-safe way. Scala, the word came from “Scalable Language”, is a hybrid functional programming language which smoothly integrates the

features of objected oriented and functional programming languages and it is compiled to run on the Java Virtual Machine. Scala has been created by Martin Odersky and released in 2003.


Why Scala? • Scala is a type-safe JVM language that incorporates both object oriented and functional programming features into an extremely concise, logical, simple and extremely powerful language. • Scala creates a “better Java” alternative by remaining its syntax very close to the Java language syntax, so that to minimize the learning difficulty. • Scala was created specifically with the goal of creating a better language, in contrast

with those restrictive, overly tedious, or frustrating features of Java.

Copyright @ 2015 Learntek. All Rights Reserved.

4


What is Spark? • Spark is a fast cluster computing technology, designed for fast computation in

Hadoop clusters. It is based on Hadoop MapReduce programming and it extends the MapReduce model to efficiently use it for more types of computations, like interactive queries and stream processing. Spark uses Hadoop in two different ways – one is storage and another one is processing. As Spark is having its own cluster management computation, it uses Hadoop for storage purpose only.


Why Spark? • Spark was introduced by Apache Software Foundation for speeding up the Hadoop software computing process. • The main feature of Spark is its in-memory cluster computing that highly increases

the speed of an application processing. • Spark is designed to cover a wide range of workloads such as batch applications,

iterative algorithms, interactive queries and streaming applications by reducing the management burden of maintaining separate tools. Copyright @ 2015 Learntek. All Rights Reserved.

6


Introduction to Scala • Scala & spark Training – Overview of Scala

• Installing Scala • Scala Basics • IDE for Scala

Copyright @ 2015 Learntek. All Rights Reserved.

7


Scala Programming • • • • • •

Variables & Methods Literals Reserved Words Operators Precedence Rules If Expression

Copyright @ 2015 Learntek. All Rights Reserved.

• • • • • •

For Expression Exception handling with Try Expression Match Expression While Loops Do-While Loops Implicit Conversion

8


Functions in Scala • • • • •

Methods First class Function Higher Order Methods Function Literal Partially Applied Function

Copyright @ 2015 Learntek. All Rights Reserved.

• • • •

Tail Recursion Closure Currying Control Abstraction

9


Traits & OOPs in Scala • Traits • Classes & Objects • Abstract Class • Access Modifiers • Functional Programming

• Scala Class Hierarchy • Package and Imports Copyright @ 2015 Learntek. All Rights Reserved.

10


Case Class & Pattern Matching • Pattern type • Pattern Guard

• Sealed Class • Option Type

• Extractor

Copyright @ 2015 Learntek. All Rights Reserved.

11


Scala Collection • Immutable And Mutable collection • Array

• Sets • Lists

• Tuples • Maps Copyright @ 2015 Learntek. All Rights Reserved.

12


Introduction to Spark • Scala & spark Training – Problems with Traditional Large-Scale Systems

• Introducing Spark • What is Spark?

Copyright @ 2015 Learntek. All Rights Reserved.

13


Spark Basics • Spark Installation • Configure HDP 2.4 (or 2.5) on local machine

• Spark Shell • Storage layers for Spark

• Overview of Spark architecture • Initialize a Spark Context and building applications Copyright @ 2015 Learntek. All Rights Reserved.

14


IDEs for Spark Applications • SBT and its overview • Intellij

• Eclipse • Resolving dependencies for Spark applications

Copyright @ 2015 Learntek. All Rights Reserved.

15


RDDs • RDD Basics • RDD transformations and Actions

• Lazy evaluation • Element wise transformations

Copyright @ 2015 Learntek. All Rights Reserved.

16


Pair RDDs • • • •

Key-Value Pair RDD Creating Pair RDDs Transformations on Pair RDD Grouping , Joining, Sorting on Pair RDD • Data Partitioning

Copyright @ 2015 Learntek. All Rights Reserved.

• Determining a partition of Pair RDD • Operations that Benefit from Partitioning • Operations those affect the partitioning • Page Rank Example

17


Advance concepts in Spark • Accumulator

• Broadcast • Working on per-partition basis

Copyright @ 2015 Learntek. All Rights Reserved.

18


Launching Spark on cluster • Configure and launch Spark Cluster on AWS • Configure and launch Spark Cluster on Microsoft Azure

Copyright @ 2015 Learntek. All Rights Reserved.

19


Running Spark on Cluster • • • •

Spark Runtime Architecture Driver Executor Cluster Manager

Copyright @ 2015 Learntek. All Rights Reserved.

• Components of Execution : Job, Stage and Task • Spark Web URL • Driver and Executor logs • Spark-submit command

20


Caching and Persistence • RDD Lineage • Caching Overview

• Distributed Persistence

Copyright @ 2015 Learntek. All Rights Reserved.

21


Spark Algorithms • Spark SQL • Spark Streaming

• MLlib • GraphX

Copyright @ 2015 Learntek. All Rights Reserved.

22


Copyright @ 2015 Learntek. All Rights Reserved.

23


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.