Apache hadoop 2 0 ops management lab guide (2) by Niranjan Pandey

APACHE HADOOP 2.0 OPS MANAGEMENT STUDENT LAB GUIDE

Table of Content:

2 Lab 1: Installing Ambari for Hadoop…………………………………………………………..3 Lab 2: Installing Hadoop Cluster using Ambari………………………………………...11 Lab 3: Understanding Web Ambari GUI……………………………………………………24 Lab 4: Working with HBase and Pig…………………………………………….. ………….34 Lab 5: Configuring YARM Capacity Schedule using Ambari……………………… 45 Lab 6: Setting up NameNode High Availability………………………………………….54 Lab 7: Installing Storm on YARM for Real Time Processing……………………… 62 Additional Labs for Cloud: Lab 8: Installing Ambari on Cloud……………………………………………………….……66 Lab 9: Installing Apache Hadoop………………………………………………………………76

LAB 1: INSTALLING AMBARI FOR MANAGING HADOOP

4 We have to install Ambari in order to install Hadoop processes. After performing hardware and software prerequisite checks, follow the steps described below to install Ambari and to get automated installation and configuration of Hadoop Clusters. Step 1: Once repositories are available perform the following test to confirm you have the right packages. Use commands shown below to ensure you have the required repository and Postgressql is installed and running though Ambari, does the installation. If Postgresql is not installed be careful with the version support. [Note: Take your mentorâ&#x20AC;&#x2122;s support for repositories since you may get dependencies error]

Step 2: Now you have to test whether the prerequisite services are running on not, if not, install the services depending on the RedHat/Centos/Ubuntu version that you are using. Shown below is the test of ntp service, which is required for synchronizing the clock of all the hosts.

Step 3: To ensure the Ambari Server automatically installs Ambari Agents in all your cluster hosts, you must set up password-less SSH connections between the main installation (Ambari Server) host and all other machines. The Ambari Server host acts as the client and uses the key-pair to access the other hosts in the cluster to install the Ambari Agent. Generate the public and private SSH keys on the Ambari Server host. Run the command provided below to generate the key. $ssh-keygen Step 4: Make the keys available and set the right permissions to it as specified below.

Step 5: When all the steps are executed successfully, use the following command to start the installation of Ambari.

Step 6: When the installation is complete youâ&#x20AC;&#x2122;ll see the following message as specified in the screen shot.

Step 7: When the Ambari product is installed we need to start setting up the server as specified below:

7 If this step fails ensure firwall /iptables and SELinux state is as specified in the diagram above. Note: For Ambari to communicate with the hosts it deploys to and manages during the setup, certain ports must be open and available. The easiest way to do this is to temporarily disable iptables. chkconfig iptables of /etc/init.d/iptables stop You can restart iptables after the setup is complete. Also ensure: â&#x20AC;˘ SELinux must be temporarily disabled for the Ambari setup to function. Run the following command on each host in your cluster: setenforce 0 On the RHEL/CentOS installation host, if PackageKit is installed, open /etc/yum/pluginconf.d/refresh-packagekit.conf with a text editor and make this change: enabled=0

Step 8: This step is to accept the options inquired by Ambari to install jdk. Remember, you can install your own jdk and point Ambari to use it. In our Installation we are using jdk installed by Ambari.

8 Step 9: When everything is done as specified above, you will get the successfully installed message.

Step 10: Start the Ambari-server and validate the installation as specified below.

Step 11: Validate the Installation: Open a browser and enter http://localhost:8080, you will observe the browser launching the login page.

Step 12:

9 Enter the userid/password. The default is: admin/admin.

Step 13: Click on Login and you will find the Config and Install webpage available.

The steps above ensure Ambari is installed on your machine. In next lab we will go through the steps of installation and getting the Hadoop cluster up and running with various services.

LAB 2: INSTALLING HADOOP CLUSTER USING AMBARI

Step 1:

11 Login to Ambari using the previous lab instruction if you are not already logged in. Step 2: Start following the Wizard to configure the cluster running, as specified below.

Step 3: Enter the name of the cluster in the Text field and click “Next”. We enter “Webage” as the name. Always provide a logical name to the cluster.

Step 4: When you click “Next”, we get to another screen where we select the option to select the HDP version as specified below.

Step 5: Click on the Repository configuration and select the right repos as per the operating system that is in use. In this case we are using Red hat Enterprise server 6.4.

Step 6: Once you move to the next step, enter the name of the host. In this case, its “localhost.localdomain”. We have to make an entry in the etc/hosts to enter to ip and dns, to keep it simple we use the default entries. Also select the id rsa file in ssh to load the private key as specified below. Copyright © 2014 XcelFrameworks LLC. All Rights Reserved.

Step 7: Click “Next”, then you have to readjust the values. If you encounter any problem consult with your mentor. Click “Next” and we get the dialog with the registering hosts.

If any hosts were selected in error, you can remove them by selecting the appropriate checkboxes and clicking the grey “Remove Selected” button. To remove a single host, click the small white “Remove” button in the Action column. At the bottom of the screen, you may notice a yellow box that indicates some warnings were encountered during the check process. For example, your host may have already had a copy of wget or curl. On the same page you can get

14 access to a python script that can help you clear any issues you may encounter and let you “Rerun Checks”. When you are satisfied with the list of hosts, click “Next”. Step 8: Choose the services that you want to install and configure. Hortonworks Data Platform is made up of a number of services. You must at a minimum install HDFS, but you can decide which of the other services you want to install. If you want to use Ambari for monitoring your cluster, make sure you select Nagios and Ganglia. If you do not select them, you get a warning popup when you finish this section. If you are using other monitoring tools, you can ignore the warning. The diagram below shows the successful registration of host.

Step 8: We will have to select the desired services as follows.

15 Step 9: Select all to preselect all the items or minimum to preselect only HDFS. Use the checkboxes to unselect (if you have used all) or select (if you have used minimum) to arrive at your desired list of components. Step 10: In these steps, we have to assign masters as specified below. To keep the labs simple and work in limited VM we are using single host to understand services.

Step 11: The Ambari install wizard attempts to assign the slave components (DataNodes, NodeManagers and RegionServers) to the appropriate hosts in your cluster. It also attempts to select the hosts for installing the appropriate set of clients. Use all or none to select all of the hosts in the column or none of the hosts, respectively. If a host has a red asterisk next to it, that host is also running one or more master components. Hover your mouse over the asterisk to see which master components are on that host. Fine-tune your selections by using the checkboxes next to specific hosts. Step 12: Below are some of the other service which we can select.

Step 13: Assign Slaves and Clients. As an option you can start the HBase REST server manually after the installation process is complete. It can be started on any host that has the HBase Master or the Region Server installed. If you attempt to start it on the same host as the Ambari server, however, you need to start it with the -p option, as the default port is 8080 and that conflicts with the Ambari Web default port.

17 Step 14: Customize Services as specified below.

Step 15: To eliminate the error prompted in Hive, Oozie and Nagios we need to configure it as specified below.

Repeat the task for Oozie and Hive and click Next to start the Installation Process as specified below. This process may take some time and may need to be retried till all components Step 16: The final step is to review and start the installation process. When you proceed you will find numerous errors. Analyse the dependency and depending on the Operating system select and install the dependency to get the final screen as specified below. Since Ambari tries to start services you get get the final screen as specified below.

Step 17: Go to summary and you will find the status of the installed services as specified below.

Step 18: Click on complete and you will find the status and the dashboard as specified below.

Step 19: To validate the installation type in http://localhost:50070 as specified below.

Step 20: Click on the Browse file system and you will find the directories created in the HDFS file system as specified below.

This confirms that the required processes are running and installation is complete.

When the Ambari Web Console is launched, navigate and explore the various options available there. This lab will guide and help you to use Ambari console efectively. Step 1: Observe the dashboard options as specified below.

You will find the dashboard, HeatMaps, Services, Hosts and Admin in the panel. The dashboard provides you with various cluster status and metrics as specified Copyright ÂŠ 2014 XcelFrameworks LLC. All Rights Reserved.

23 in the diagram above, a log with CPU Usage, Load, Heap usage. You can observe few services that are in doubt like the HBase and Ganglia with the number of alerts specified. Step 2: Click on YARN to see the service status and the Health as specified below.

Ensure in the Summary it displays the ResourceManagers, NodeManagers and the YARN clients in the right state and running. Also observe the Alert and Health Check results as specified. Step 3: Select heatmaps to see the resource usage as specified below.

24 Step 4: Select the Metrics and the Factor that you want to view. We can see the Host, HDFS, YARN and HBase there along with various other metrics. Select all of them one by one to record the usage sscenario as specified below.

Step 5: Analysis of HDFS: To view the read, write, GC and heap usage select HDFS and the metrics as specified below.

Step 6: Analyzing YARN: Process status by selecting the right metrics specified below.

Step 7: Analyzing HBase with the following metrics to visulaise that provides HBase Region, Queue Sizes and Memstore Metrics as specified below.

Step 8: Analyze Monitoring services like Ganglia and Nagios as specified below.

Step 9: Also observe the HDFS to explore the configuration. Its suggested to make a note of the various directories and metrics.

Step 10: Checkout Secondary NameNode and DataNode as specified below.

Step 11: Also ensure WebHDFS is enabled and have the right heap size and metrics set.

Step 12: Configuring Log: To configure logs select the advanced tab as specified below.

Step 13: Get the right metrics of blocksize, heartbeat intervals and transfer threads as specified below.

Step 14: Creating Ambari users: By default the Ambari installation provides admin/admin as the username and password. You can use Admin option to configure Users, Security, Cluster, High Availiablity and Misc services.

Step 15: Getting cluster details: To get the services and their versions select the cluster option as specified below.

Step 16: High Availability is one of the features of Hadoop 2.0 version. Ambari provides an easy way of configuring High Availability as specified below.

Step 17: Controlling services from Ambari: We can start, stop or control various options as specified below.

You can select the right action that you want to perform on the services of your choice. This Lab provides a proper understanding of using Ambari to manage the various services available in HDP2.0,

LAB 4: WORKING WITH HBASE AND PIG

Step 1: Go to the DashBoard of Ambari GUI application and ensure that Zookeeper and HBase Services are running. Till this steps we have not started these services in our labs, so you will find the status as shown below.

Step 2: Start ZooKeeper and Hbase to ensure that both are green as specified below.

Step 3: When HBase has started you can launch HBase shell and test it by creating tables and putting some value to it. The picture below elaborates this. You have to first check the services that are running and then change the user to HBase and move to HBase shell as specified below. Ensure you change the user to HBase before using any HBase command.

Step 4: Creating tables in HBase, it is specified below.

Step 5: In the table above named “customers” we maintain the personal details as a column family. To confirm the construction of the table we will use the following command to put the data to the table and see the content of table.

Step 6: Finally we will delete the table. In order to delete the table we need to first disable it as specified below.

Finally the list shows no table. This confirms that the Hbase is running fine. Working with Pig: Pig runs on Hadoop and makes use of MapReduce and the Hadoop Distributed File System (HDFS). The language for the platform is called Pig Latin, which is abstracted from the Java MapReduce idiom into a form similar to SQL. Pig Latin is a flow language whereas SQL is a declarative language. SQL is great for asking a question of your data, while Pig Latin allows you to write a data flow that describes how your data will be transformed. Since Pig Latin scripts can be graphs (instead of requiring a single output) it is possible to build complex data flows involving multiple inputs, transforms and outputs. Users can extend Pig Latin by writing their own functions using Java, Python, Ruby or other scripting languages. A user can run Pig in two modes:  Local Mode: With access to a single machine, all files are installed and run using a local host and file system.  MapReduce Mode: This is the default mode, which requires access to a Hadoop cluster. Step 1: Launching Pig Grunt Shell: By Default Pig uses MapReduce Mode and connects to HDFS file system as specified below.

Step 2: Copy the sample.txt from the pigSamples folders as specified below.

Step 3: Use Pig to Load and create a Pig Data Structure as specified below.

Step 4: Validate that data structure is created as specified below.

When you observe the data structure, it ensures Pig is working fine and can be submitted to developers. Pig Detailed Lab: Step 1: Creating a data file. Create a text file named "movies.txt" in your local file system on the sandbox and add the following content:  The Nightmare Before Christmas,1993,3.9,4568  The Mummy,1932,3.5,4388  Orphans of the Storm,1921,3.2,9062 Copyright © 2014 XcelFrameworks LLC. All Rights Reserved.

37       

The Object of Beauty,1991,2.8,6150 Night Tide,1963,2.8,5126 One Magic Christmas,1985,3.8,5333 Muriel's Wedding,1994,3.5,6323 Mother's Boys,1994,3.4,5733 Nosferatu: Original Version,1929,3.5,5651 Nick of Time,1995,3.4,5333

Now load the file "movies.txt" into a directory on HDFS named 'user/hadoop' using the shell command as shown. hadoop fs -put movies.txt /user/hadoop Using Pig's Grunt Shell Interface. Type "pig" at the local shell prompt to get into Pig's Grunt shell. Load the content of "movies.txt" into a variable named "Movies". grunt> Movies = LOAD '/user/hadoop/movies.txt' USING PigStorage(',') as (id,name,year,rating,duration); Or, if you want to assign types: grunt> Movies = LOAD '/user/hadoop/movies.txt' USING PigStorage(',') as (id:int,name:chararray,year:int,rating:float, duration:int); The commands are executed as shown below:

To get the content of the variable "Movies", use the following command: DUMP Movies;

The command 'DUMP' would execute as follows:

To check the format of the variable "Movies", use the following command: Describe Movies;

Step 2: Filtering Data: Now that the data is loaded into the variable "Movies", let's filter the data for movies with a rating of greater than 3.5 using the following command: Grunt>movies_greater_than_three_point_five = FILTER Movies BY rating>3.5; From the variable 'movies_greater_than_three_point_five', let's' extract the values for 'year','rating', and 'moviename' and save them in another variable named 'foreachexample'. grunt> foreachexample= foreach movies_greater_than_three_point_five

39 generate year,rating,name; grunt> dump foreachexample;

Filter result

Step 3: Storing variable values into HDFS: Let's store the values of variable 'movies_greater_than_three_point_five' into HDFS: grunt> STORE movies_greater_than_three_point_five INTO '/user/hadoop/movies_greater_than_three_point_five' USING PigStorage (',');

On any error conditions, Hadoop immediately throws an exception. In the case above, there was a 'file not found' error. Now that we have the data in HDFS, use the 'cat' command to open the processed file: grunt> cat /user/hadoop/movies_greater_than_three_point_five/part-m-00000

Step 4: File Commands: Pig's Grunt shell has commands that can run on HDFS as well as on the local file system. grunt> cat /user/hadoop/movies.txt grunt> ls /user/hadoop/ grunt> cd /user/ grunt> ls Copyright ÂŠ 2014 XcelFrameworks LLC. All Rights Reserved.

41 grunt> cd /user/hadoop grunt> ls grunt> copyToLocal /user/hadoop/movies.txt /home/ grunt> pwd Step 5: To get help in Pig, simply type "help" in the Grunt shell.

LAB 5: CONFIGURING YARN CAPACITY SCHEDULE USING AMBARI

YARN's CapacityScheduler is designed to run Hadoop applications in a shared, multi-tenant cluster while maximizing the throughput and the utilization of the cluster. This Lab will help provide you with step-by-step approach to Configure YARN. Step 1:

Step 2: When we login by entering admin/admin as userid and password we get to the DashBoard.

Step 3: Click on services to get the following .

44 Step 4: Select YARN in the DashBoard.

Step 5: Update the configuration for Yarn Capacity Scheduling policies.

Step 7: Scroll up to the top of the page and click on quick links. Then select ResourceManager UI from the dropdown.

46 As you can see below we only have the default policy.

Step 9: Let’s change the capacity scheduling policy to the one where we have separate queues and policies for Engineering, Marketing and Support departments. Prepare the Lab Accordingly with the help of your mentor’s input. Move to file yarnpolicy1.txt, open it and paste in the capacity scheduler as specified below.

Step 10 Click Save and confirm on the dialog box:

Step 11: At this point the configuration is saved but we still need to restart the afected components by the configuration change as indicated in the orange band below:

Step 12: Also note that there is now a new version of the configuration as indicated by the green Current label. Letâ&#x20AC;&#x2122;s restart the daemons by clicking Restart All.

Step 13: Wait for the restart to complete:

Step 14: Go to the browser tab with the Capacity Scheduler policy and refresh the page. Thereâ&#x20AC;&#x2122;s our new policy:

LAB 6: SETTING UP NAMENODE HIGH AVAILABILITY

Step 1: On Ambari Web, go to the Admin view. Select High Availability in the left navigation bar.

Step 2: Check to make sure you have at least three hosts in your cluster and are running at least three ZooKeeper servers. Click Enable NameNode HA and follow the Enable NameNode HA Wizard. The wizard describes a set of automated and the manual steps that you must take to set up NameNode high availability. Get Started: This step gives you an overview of the process and allows you to select a Nameservice ID. You use this Nameservice ID instead of the NameNode FQDN once HA has been set up. Click Next to proceed. Step 3: Enable NameNode HA Wizard as specified below.

Step 4: Select Hosts: Select a host for the additional NameNode and the JournalNodes. The wizard suggests options that you can adjust using the dropdown lists. Click Next to proceed.

Step 5: Review and Confirm your host selections and click Next.

53 Follow the instructions in the step. You need to login to your current NameNode host to run the commands to put your NameNode into safe mode and create a checkpoint. When Ambari detects that it is successful, the message on the bottom of the window changes. Click Next.

Step 7: Configure Components: The wizard configures your components, displaying progress bars to let you track the steps. Click Next to continue.

54 Step 8: Initialize JournalNodes: You need to login to your current NameNode host to run the command to initialize the JournalNodes. When Ambari detects it is successful, the message on the bottom of the window changes. Click Next.

Step 9: Start Components: The wizard starts the ZooKeeper servers and the NameNode, displaying the progress bars to let you track the steps. Click Next to continue.

Step 10 Initialize Metadata: For this step you must login to both the current NameNode and the additional NameNode. Make sure you are logged into the correct host for each command. Click Next when you have completed the two commands. A Confirmation popup appears to remind you that you must do both these steps. Click OK to confirm.

Step 11: Finalize HA Setup: The wizard shows the setup, displaying the progress bars to let you track the steps. Click Done to finish the wizard. After the Ambari Web GUI reloads, you may see some alert notifications. Wait for a few minutes until the services come back up. If necessary, restart any components using Ambari Web.

Ensure you Choose Services, then start Nagios, after completing all these steps in the HA wizard.

LAB 7: INSTALLING STORM ON YARN FOR REAL TIME PROCESSING

57 Apache Storm is a free open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. This lab will help provide you with the capability to configure your HDP2.0 with Storm. Step 1: Test that Ambari is installed and Hadoop is setup properly. Step 2: Get a copy of the repository for Storm on YARN from GitHub wget https://github.com/yahoo/storm-yarn/archive/master.zip Step 3: Unzip the master file and move to storm-yarn folder unzip master cd storm-yarn-master Step 4: Install Maven 3.11 wget http://mirror.symnds.com/software/Apache/maven/maven3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz Step 5: untar the Maven 3.11 tar â&#x20AC;&#x201C;zxvf apache-maven-3.1.1-bin.tar.gz Step 6: Move the maven binary to /usr/lib/maven mv apache-maven-3.1.1 /usr/lib/maven Step 7: Add maven to environmental variable export PATH=$PATH:/usr/lib/maven/bin Step 8: Edit pom.xml

Step 9: Set up Storm on cluster: 1. Create a work folder to hold the working files for Storm. 2. Copy these files to your work folder and set up the environment variables. cp lib/storm.zip /your/work/folder 3. Go to your work folder and unzip storm.zip 4. Add storm-0.9.0-wip2 and storm-yarn-master bin folders to path 5. Add storm.zip to hdfs /lib/storm/0.9.0-wip2/storm.zip hdfs dfs –put storm.zip /lib/storm/0.9.0-wip2/ 6. Add storm-0.9.0-wip2 and storm-yarn-master bin folders to path. Make sure to update your workfolder! export PATH=$PATH:/usr/lib/maven/bin:/your/work/folder/storm-0.9.0wip21/bin:/your/work/folder/storm-yarn-master/bin 7. Start Maven in the storm-yarn-master folder. cd storm-yarn-mastermvn package Start Storm: 1. Edit the storm.yaml file from storm-0.9.0-wip2/conf/storm.yaml to include your Zookeeper servers. 2. Store this file for safekeeping if desired. Then run: storm-yarn launch <path to your storm.yaml file> 3. Get the stormconfig with the yarn application id yarn application –list. 4. We store the storm.yaml file in the .storm directory so the storm command can find it when it is submitting jobs. 5. Try visiting http://localhost:7070

LAB 8: INSTALLING AMBARI ON CLOUD

61 This Lab provides Step-by-Step Approach to install Ambari for Hadoop Installation and Management. Step 1: Check the prerequisite 1. Amazon Web Services account with the ability to launch 7 large instances of EC2 nodes. 2. A Mac or a Linux machine. You could also use Windows but you will have to install additional software such as SSH clients and SCP clients, etc. 3. Lastly, we assume that you have basic familiarity with EC2 to the extent that you have created EC2 instances and SSH’d in. Step 2: Get the information on connection details of your server instance from Administrator along with key and store the key locally. In current Environment keys are located on you’re your desktop as shown below:

Step 3: Get Information over your public and private DNS from the Batch Adminitrator. This information will be used to connect to instances using Putty or Cygwin. This Lab considers following Public and private DNS: Public DNS: ec2-54-205-232-244.compute-1.amazonaws.com Private DNS: ip-10-186-145-41.ec2.internal

Step 4: Connect to the instance using Putty. Click on the Putty icon and launch putty.

Step 5: In the Host Name box, enter user name: @public_dns_name. Be sure to specify the appropriate user name for your AMI. For example: 1. For an Amazon Linux AMI, the user name is ec2-user. 2. For a RHEL5 AMI, the user name is either root or ec2-user. 3. For an Ubuntu AMI, the user name is ubuntu. 4. For a Fedora AMI, the user name is either fedora or ec2-user. 5. For SUSE Linux, the user name is root. Otherwise, if ec2-user and root don't work, check with the AMI provider.  Under Connection type, select SSH.  Ensure that Port is 22.

Step 6: In the Category pane, expand Connection, expand SSH, and then select Auth. Complete the following: 1. Click Browse. 2. Select the .ppk file that you generated for your key pair, and then click Open. 3. (Optional) If you plan to start this session again later, you can save the session information for future use. Select Session in the Category tree, enter a name for the session in Saved Sessions, and then click Save. 4. Click Open to start the PuTTY session.

Step 7: Once you get connected follow the following step.

65 Step 8: Setting up Ambari: Get the bits of HDP and add it to the repo:

Step 9: Refresh the repo:

66 Install the Ambari server:

Step 11: Download the required bits:

Step 13: Ambari server Installation successful:

Step 14: Once Ambari server is installed, you need to configure it:

Now accept all the defaults:

Once you complete the Configuration steps, test for Ambari server startup. Step 15: Start the server as specified below:

Step 16: Test the Ambari-server launching 8080 using your Public DNS from browser:

This lab successfully installs Ambari-server to be used in Installation and Configuration of HDP 2.0.

LAB 9 INSTALLING APACHE HADOOP 2.0

This Lab installs Apache Hadoop and elaborates the steps to get the Hadoop running. Step 1: Create user hadoop as specified below:

Step 2: Add password, we added ‘hadoop’ as password:

72 Step 3: Change the user to Hadoop:

Step 4: The following steps have to be used to generate key using ssh-keygen:

Step 5: Copy the key to authorized keys:

Step 6: Download Hadoop:

Step 7:

Step 8: Once the download is done it shows following:

Step 9: Unzip the tar as shown:

Step 10:

Step 11: Ensure you have Java installed execute the following steps:

Step 12: Set JAVA_HOME to java installed and check the version on hadoop as specified below:

Now we have hadoop installed we need to configure it as per our architecture.

75 Step 13: Set up the environment: Edit ~/.bashrc file and append following values at end of file. export HADOOP_HOME=/home/hadoop/hadoop export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin Step 14: Apply the changes $$ source ~/.bashrc Step 15: Configure the required files Hadoop has many configuration files, which needs to configured as per your Hadoop Infrastructure requirements. Lets start with the configuration with basic Hadoop single node cluster setup. First navigate to below location Edit core-site.xml <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> Edit hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> Copyright Â© 2014 XcelFrameworks LLC. All Rights Reserved.

76 </property> <property> <name>dfs.name.dir</name> <value>file:///home/hadoop/hadoopdata/hdfs/namenode</val ue> </property> <property> <name>dfs.data.dir</name> <value>file:///home/hadoop/hadoopdata/hdfs/datanode</val ue> </property> Edit mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> Edit yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> Step 16: Format the node: $ hdfs namenode â&#x20AC;&#x201C;format Step 17: Observe the output as specified below for successful formatting 14/05/04 21:30:55 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = svr1.tecadmin.net/192.168.1.11 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.4.0 ... Copyright ÂŠ 2014 XcelFrameworks LLC. All Rights Reserved.

77 ... 14/05/04 21:30:56 INFO common.Storage: Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted. 14/05/04 21:30:56 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 14/05/04 21:30:56 INFO util.ExitUtil: Exiting with status 0 14/05/04 21:30:56 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at svr1.tecadmin.net/192.168.1.11 ************************************************************/ Step 18: Start the services Lets start your Hadoop cluster using the scripts provided by Hadoop. Just navigate to your hadoop sbin directory and execute scripts one by one. $ cd $HADOOP_HOME/sbin/ Now run start-dfs.sh script. $ start-dfs.sh Sample output] 14/05/04 21:37:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [localhost] localhost: starting namenode, logging to /home/hadoop/hadoop/logs/hadoophadoop-namenode-svr1.tecadmin.net.out localhost: starting datanode, logging to /home/hadoop/hadoop/logs/hadoophadoop-datanode-svr1.tecadmin.net.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenodesvr1.tecadmin.net.out 14/05/04 21:38:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Now run start-yarn.sh script. $ start-yarn.sh [Sample output]

78 starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoopresourcemanager-svr1.tecadmin.net.out localhost: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarnhadoop-nodemanager-svr1.tecadmin.net.out Step 19: Access the namenode as specified below: Hadoop NameNode started on port 50070 default. Access your server on port 50070 in your favorite web browser:

Step 20: Now access port 8088 for getting the information about cluster and all applications:

Step 21: Access port 50090 for getting details about secondary namenode:

Step 22: Access port 50075 to get details about DataNode:

If all tests are successful completed, that is when we know our Hadoop Installation is done.