Apache Spark with Java 8

Page 1

Apache Spark with Java 8


CHAPTER – 4 THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN & DEVELOPMENT


Apache Spark with Java 8 Training : Why Spark? Apache Spark with Java 8 Training :Spark was introduced by Apache Software Foundation for speeding up the Hadoop software computing process. The main feature of Spark is its in-memory cluster computing that highly increases the speed of an application processing. Spark is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries and streaming applications by reducing the management burden of maintaining separate tools.

Copyright @ 2019 Learntek. All Rights Reserved.

3


Apache Spark also have the following features. Speed− Spark helps to run an application in Hadoop cluster, up to 100 times faster in memory and 10 times faster when running on disk by reducing number of read/write operations to disk and by storing the intermediate processing data in memory. Supports multiple languages− Spark comes up with 80 high-level operators for interactive querying and provides application development with built-in APIs in different languages in Java, Scala, or Python. Advanced Analytics− Spark not only supports ‘Map’ and ‘reduce’ programming but it also supports SQL queries, Streaming data, Machine learning (ML), and Graph algorithms. Copyright @ 2019 Learntek. All Rights Reserved.

4


Apache Spark with Java 8 Training : Why Java8 With the introduction of lambda expression in Java8, it has provided support of functional programming in a beautiful way. In addition to lambda expression, it has also introduced Streaming API, which can be thought of as a collection framework for functional programming in Java without storing the elements. With of introduction of lambda expression in Java8, code can be written in more concise and elegant way. Learning curve has also become quite smooth as one has to learn just Apache Spark API, not Scala.

Copyright @ 2019 Learntek. All Rights Reserved.

5


Apache Spark with Java – Overview of Java8 Overview of Interface, Static method and Default method in interface Anonymous Inner Classes Introduction to Lambda Expressions Functional Interface, type inference Method references Composing Lambda Understanding Closure Overview of Streams Working with Streams Infinite Streams

Copyright @ 2019 Learntek. All Rights Reserved.

6


Apache Spark with java – Introduction to Spark Introduction to Big Data Big Data Problem Scale-Up Vs Scale-Out Architecture Characteristics of Scale-Out Introduction to Hadoop, Map-Reduce and HDFS Introducing Spark

Copyright @ 2019 Learntek. All Rights Reserved.

7


Hortonworks Data Platform (HDP) using Virtual box Importing HDP VM image using Virtual box on local machine Configuring HDP Overview of Ambari and its components Overview of services configuration using Ambari Overview of Apache Zeppelin Creating, importing and executing notebooks in Apache Zeppelin

IDEs for Spark Applications Intellij Eclipse Resolving dependencies for Spark applications Copyright @ 2019 Learntek. All Rights Reserved.

8


Spark Basics Spark Shell Overview of Spark architecture Storage layers for Spark Initialize a Spark Context and building applications Submitting a Spark Application Use of Spark History Server Spark Components Spark Driver Process Spark Executor Spark Conf and Spark Context SparkSession object Overview of spark-submit command Spark UI Copyright @ 2015 Learntek. All Rights Reserved.

9


RDDs Overview of RDD RDD and Partitions Ways of Creating RDD RDD transformations and Actions Lazy evaluation RDD Lineage Graph (DAG) Element wise transformations Map Vs FlatMap Transformation Set Transformation RDD Actions Overview of RDD persistence Methods for persisting RDD Persisting RDD with Storage option Illustration of Caching on an RDD in DAG Removal of Cached RDD Copyright @ 2019 Learntek. All Rights Reserved.

10


Pair RDDs Overview of Key-Value Pair RDD Ways of creating Pair RDDs Transformations on Pair RDD ReduceByKey(), FoldByKey(),MapValues(), FlatMapValues(),keys() and Values() Transformation Grouping, Joining, Sorting on Pair RDD ReduceByKey() Vs GroupByKey() Pair RDD Action

Copyright @ 2019 Learntek. All Rights Reserved.

11


Launching Spark on cluster Configure and launch Spark Cluster on Google Cloud Configure and launch Spark Cluster on Microsoft Azure

Logging and Debugging a Spark Application Setting up a window environment for executing Spark Application using IDE Steps of using slf4j logging mechanism in Spark Application Attaching a debugger to Spark Application Example of debugging a Spark application running inside a cluster

Copyright @ 2019 Learntek. All Rights Reserved.

12


Spark Application Architecture Spark Application Distributed Architecture Spark Application submission Mode Overview of Cluster Manager Example of using Standalone Cluster Manager Driver and its responsibilities Overview of Job, Stage and Tasks Spark Job Hierarchy Executor Spark-submit command and various submission options Yarn Cluster Manager Yarn Architecture Client and Cluster Deploy-mode Copyright @ 2019 Learntek. All Rights Reserved.

13


Advance concepts in Spark Accumulator Broadcast RDD partitioning Re-partition RDD Determining RDD partitioner Partition based RDD like mapPartitions, mapPartitionsWithIndex, mapPartitionsToPair

Copyright @ 2019 Learntek. All Rights Reserved.

14


Spark SQL Introduction to SparkSQL Creating SparkSession with Hive Support DataFrame Ways of Creating DataFrame Registering a DataFrame as View DataFrame Transformations API DataFrame SQL statement Aggregate Operations DataFrame Action Catalyst Optimizer Limitation of DataFrame Introduction to Dataset Copyright @ 2019 Learntek. All Rights Reserved.

15


Introduction to Encoder Creating Dataset Functional transformation on Dataset Loading CSV, JSON, Parquet format file in SparkSQL Loading and saving data from/in Hive, JDBC, HDFS, Cassandra Introduction to User-Defined-Function (UDF) Customizing a UDF Usage of UDF in DataFrame Transformations API Usage of UDF in Spark SQL statement Introduction to Window Function Steps of defining a window function Illustration of Window function usage Copyright @ 2019 Learntek. All Rights Reserved.

16


Introduction to UDAF Customizing a UDAF Illustration of customized UDAF usage

Copyright @ 2015 Learntek. All Rights Reserved.

17


Basic Spark Streaming Introduction to data streaming Spark Streaming framework Spark Streaming and Micro batch Introduction of DStreams DStreams and RDD Word Count example using Socket Text Stream streaming with Twitter feeds Setting up a Twitter App Resolving Twitter dependency in Spark Streaming Application

Copyright @ 2019 Learntek. All Rights Reserved.

18


Steps of creating Uber Jar Example of extracting hashtags from tweet data Troubleshooting Twitter Streaming issue in Spark Application Steps of creating Spark Streaming Application Architecture of Spark Streaming Stateless Transformations Twitter Streaming examples using stateless transformation Introduction to stateful Transformations Window Duration and Slide Duration Window Operations Naive and inverse window reduce operation Checkpoint Tracking State of an event using updateStateByKey operation Copyright @ 2019 Learntek. All Rights Reserved.

19


Interact directly with RDD using transform () operation Example of HDFS file streaming Example of Spark-Kafka interaction Saving DStreams to external file system

Prerequisites of Apache Spark with Java 8: Understanding of OOPS concept and programming construct in Java will be required. Having programming experience in Java7 will be mandatory. Having understanding or experience of Lambda expressions in Java8 will be an added advantage. Copyright @ 2019 Learntek. All Rights Reserved.

20


For more Training Information , Contact Us Email : info@learntek.org USA : +1734 418 2465 INDIA : +40 4018 1306 +7799713624 Copyright @ 2019 Learntek. All Rights Reserved.

21


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.