Get Started with Hadoop Hive HiveQL Languages

Page 1

Get Started with Hadoop Hive HiveQL Languages

Copyright © JanBask Training. All rights reserved

www.JanBaskTraining.com


Career Options Of Hadoop Big Data Certification

✓ ✓ ✓ ✓ ✓ ✓ ✓

Copyright © JanBask Training. All rights reserved

Hadoop to HiveQL Uses of Hadoop Hive Remember that Hive is not Uses of HiveQL Major Reasons to use Hadoop for Data Science Bottom Line

www.JanBaskTraining.com


Hadoop to HiveQL

Apache Hadoop is the storage system which is written in Java, which is an open-source, fault-tolerant, and scalable framework. It gives a platform to process a large amount of data. Hadoop makes use of Data Lake, which supports the storage of data in its original or exact format. Hadoop is designed in such a way through which there can be a scale up from single servers to thousands of machines, each of which offering local computation and storage.

Copyright Š JanBask Training. All rights reserved

www.JanBaskTraining.com


Uses of Hadoop Uses of Hadoop ▪ There is no need to preprocess data before storing it (you may store as much data as you want and decide later how to use it) ▪ You may easily grow your system to handle more data easily by adding nodes (only a little administration is required) ▪ It is convenient to use for millions or billions of transactions • Many cities, states, and countries make use of Hadoop to analyze data. For example, figuring out the traffic jams which can be controlled by the use of Hadoop (Concept of Smart City) • Big data is also used by many businesses to optimize their data performance in an effective manner

Copyright © JanBask Training. All rights reserved

www.JanBaskTraining.com


Hive

Big Data Analyst ▪ Apache Hive is a data warehouse software project which was built on the top of Apache Hadoop for supplying data query and analysis.

▪ It makes use of declarative language, which is similar to SQL called HQL. ▪

Hive allows programmers who are well-known with the language to write custom MapReduce framework to perform more knowledgeable analysis.

Copyright © JanBask Training. All rights reserved

www.JanBaskTraining.com


EcoSystem Components

The functional features of Hive are▪ Data Summarization ▪ Query ▪ Analysis

Copyright © JanBask Training. All rights reserved

www.JanBaskTraining.com


HQL

▪ The Hive Query Language is a SQL like an interface which is used to query data stored in the database and file systems that are integrated with Hadoop. It supports simple SQL like functions- CONCAT, SUBSTR, ROUND, etc. and aggregate functions likeSUM, COUNT, MAX, etc. ▪ It also supports clauses- GROUP BY and SORT BY. Also, it is possible to write userdefined functions using Hive Query Language (HQL). Basically, it makes use of the well-known concepts from the relational database world, like- tables, rows, columns, and schema. Copyright © JanBask Training. All rights reserved

www.JanBaskTraining.com


Uses of HiveQL

▪ HQL is the twin of SQL ▪ HQL allows programmers to plug-in custom mappers and reducers ▪ HQL is scalable, familiar, extensible, and fast to use ▪ It provides indexes to correct queries ▪ HQL contains a large number of user function APIs which can be used to create custom behavior into the query engine ▪ It perfectly fits in the requirement of a low-level interface of Hadoop

Copyright © JanBask Training. All rights reserved

www.JanBaskTraining.com


Major Reasons to use Hadoop for Data Science When you have to deal with a large amount of data, Hadoop is the best option to choose When you are planning to implement Hadoop on your data, the first step is to understand the complexity level of data and the data-rate based on which data is going to grow. In this case, cluster planning is required. Depending upon the size of data of the company (GBs or TBs), Hadoop is helpful here. ▪ Different types of data ▪ Numeric data ▪ Nominal data ▪ Different specific applications

Copyright © JanBask Training. All rights reserved

www.JanBaskTraining.com


Bottom Line

Hadoop has become de-facto of Data Science and is the gateway of Big Data related technologies. It is the foundation of other Big Data technologies like Spark, Hive, etc. As per Forbes– “Hadoop market is expected to reach $99.318 by 2022 at a CAGR of 42.1 percent.” So, this is the right time to give a push to your skills in the field of Big Data. Happy Reading!

Copyright © JanBask Training. All rights reserved

www.JanBaskTraining.com


Thank you Happy learning

Copyright © JanBask Training. All rights reserved

www.JanBaskTraining.com


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.