APACHE FLINK
Apache Flink
The following topics will be covered in our Apache Flink Online Training:
Copyright @ 2015 Learntek. All Rights Reserved.
2
What is Apache Flink? ➢ Apache Flink is an open source stream processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming dataflow engine written in Java and Scala. Apache Flink’s dataflow programming model provides event-at-a-time processing on both finite and infinite datasets. At a basic level, Flink programs consist of streams and transformations. Copyright @ 2015 Learntek. All Rights Reserved.
3
‌.. Continues Conceptually, a stream is a (potentially never-ending) flow of data records, and a transformation is an operation that takes one or more streams as input, and produces one or more output streams as a result. Programs can be written in Java, Scala, Python, and SQL and are automatically compiled and optimized into dataflow programs that are executed in a cluster or cloud environment.
Copyright @ 2015 Learntek. All Rights Reserved.
4
Why Apache Flink? • Flink provides a high-throughput, low-latency streaming engine as well as support for event-time processing and state management. Flink applications are fault-tolerant in the event of machine failure and support exactly-once semantics. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. Flink’s pipelined runtime system enables the execution of bulk/batch and stream processing programs. Furthermore, Flink’s runtime
supports the execution of iterative algorithms natively. Copyright @ 2015 Learntek. All Rights Reserved.
5
Flink Introduction • Architecture • Distributed Execution
• Job Manager • Task Manager
• Features • Deploying Flink on Google Cloud and AWS Copyright @ 2015 Learntek. All Rights Reserved.
6
Data Stream API • Execution environment • Data sources
• Transformations • Data sinks
• Connectors
Copyright @ 2015 Learntek. All Rights Reserved.
7
Batch Processing API • Data sources • Transformations
• Broadcast Variable • Connectors to various Systems
• Iterations
Copyright @ 2015 Learntek. All Rights Reserved.
8
Structure data handling using Table API • Registering tables • Accessing the registered table
• Operators • Data types
• SQL
Copyright @ 2015 Learntek. All Rights Reserved.
9
Complex event processing • Introduction to CEP and Flink CEP • Event Streams • Pattern API
• Continuity • Selecting from Pattern
Copyright @ 2015 Learntek. All Rights Reserved.
10
Graph API • Flink Graph Library – Gelly
• Graph Representation • Graph Properties
• Graph Transformations • Graph Mutations • Iterative Graph Processing
• Scatter-Gather Processing Copyright @ 2015 Learntek. All Rights Reserved.
11
Integration between Flink and Hadoop • Flink-Yarn Session • Job Submission to Flink
• Execution of a Flink job on YARN • Flink and YARN interaction details
Copyright @ 2015 Learntek. All Rights Reserved.
12
Copyright @ 2015 Learntek. All Rights Reserved.
13