Taxmann's Business Analytics by Taxmann

Preface

In today’s rapidly evolving business landscape, the ability to harness the power of data has become paramount for success. As the business environment continues to grow in complexity, the demand for professionals skilled in deciphering data patterns and transforming them into actionable strategies has never been higher. As we delve into the intricacies of this dynamic field through this book, our aim is to equip students with a solid foundation in both the theoretical concepts and practical applications of Business Analytics.

This book serves as a comprehensive guide to Business Analytics tailored for undergraduate and postgraduate students, with a special focus on new syllabus under UGCF 2022 based on National Education Policy (NEP), 2020 by the University of Delhi.

This book has been divided into 8 chapters. From data collection and preprocessing to advanced predictive modelling and data visualization using R, this book covers a wide spectrum of topics relevant to Business Analytics. In this book, we strike a balance between theory and application, providing a solid theoretical foundation while emphasizing the practical aspects of Business Analytics. Real-life case studies, hands-on examples, and exercises are woven throughout the book to help students bridge the gap between concepts and their real-world implementation.

As authors, we invite you to embark on this educational journey of Business analytics, exploring the world of data-driven insights and strategic thinking. We would like to thank and appreciate the efforts of the editorial and production departments at Taxmann Publishing for bringing out this book.

H. K. Dangi Gurveen Kaur

I-9 PAGE Preface I-5 Chapter-heads I-7 CHAPTER 1: INTRODUCTION 1.1 Introduction 1 1.1-1 History of the term ‘business analytics’ 2 1.1-2 Architectural framework of business analytics 4 1.2 8 Analysis and analytics 10 Types of analytics 11 Descriptive analytics 11 Predictive analytics 13 Prescriptive analytics 15 Application of analytics 18 Finance 18 Marketing 19 Human resource 20 Healthcare 21 1.6 Case study on silverwind company 23 1.7 Summary 24 1.8 Exercise Questions 25 1.9 Multiple Choice Questions 25 CHAPTER 2 : DATA PREPARATION 2.1 Introduction 29 2.1-1 Applications of data preparation 30 2.1-2 Data preparation process 31

Contents

I-10 CONTENTS PAGE 2.2 Getting started with MS-Excel 32 Data preparation and cleaning using xcel 37 Filter and sort 38 Conditional formatting 42 Removing duplicates 43 Data validation 45 Identifying outliers in the data 48 2.6 Covariance 52 2.7 Correlation matrix 53 2.8 Apply to business 56 2.9 Summary 57 2.10 Exercise Questions 58 2.11 Multiple Choice Questions 59 CHAPTER 3 : DATA SUMMARISATION AND VISUALISATION Introduction 62 Data summarisation 63 Types of data summarization 64 Data visualisation 84 Data visualisation using tableau 84 Getting started with tableau 85 Exercise Questions 97 Multiple Choice Questions 99 CHAPTER 4 : GETTING STARTED WITH R Introduction to R 103 Introduction to R Studio 104 Understanding the Distinction: R vs. R Studio 106 Advantages of R 107 Getting started with R 109 Installation of R 109 Installing R Studio 110 Installing R packages 112 Importing data in R Studio 115

CONTENTS I-11 PAGE Data structures in R 117 Vector 117 Matrix 119 Array 121 Lists 123 Factors 124 Data frame 125 Apply to business 128 Summary 130 Exercise Questions 130 Multiple Choice Questions 132 CHAPTER 5 : DESCRIPTIVE STATISTICS USING R Introduction 135 Measure of Central Tendency 137 Arithmetic Mean 137 Mode 138 Median 140 Measures of Dispersion 141 Range 145 Standard Deviation (SD) 148 Variance 150 Relationship between Variables 151 Covariance 151 Correlation 153 Data Visualisation Using R 154 Histograms 157 Box plot 160 Line Plot 161 Scatter Plots 168 Apply to Business 170 Summary 171 Exercise Questions 172 Multiple Choice Questions 172

I-12 CONTENTS PAGE CHAPTER 6 : PREDICTIVE ANALYTICS 6.1 Introduction 177 6.2 Simple Linear Regression Model 178 6.2-1 Assumptions of Simple Linear Regression Model 179 6.2-2 Simple Linear Regression Formula 180 Simple Linear Regression Using R 180 182 Multiple Linear Regression 184 Assumptions of Multiple Linear Regression 185 Regression Analysis Using R 189 Preliminary Steps 189 Autocorrelation Testing Using DWT 190 Regression Testing Using R 192 6.6 Apply to Business 196 6.7 Summary 198 6.8 Exercise Questions 199 6.9 Multiple Choice Questions 200 CHAPTER 7 : TEXTUAL ANALYSIS 7.1 Introduction 203 7.2 204 Applications of Textual Data Analysis 204 Challenges of Textual Data Analysis 207 Introduction to Textual Analysis Using R 208 7.6 Methods and Techniques of Textual Analysis 216 7.6-1 Bar Chart 216 7.6-2 Word Cloud 217 Tree Map 219 Word Association (Correlation) 221 Sentiment Analysis 222 7.6-6 Cluster Analysis 225 7.7 Apply to Business 227 7.8 Summary 228

CONTENTS PAGE 7.9 Exercise Questions 229 7.10 Multiple Choice Questions 229 CHAPTER 8 : ETHICS IN BUSINESS ANALYTICS 8.1 Introduction 233 8.2 Analytics Ethics – Meaning 234 8.2-1 Importance of Ethics in Analytics 235 Ethical Issues in Analytics 238 Consideration for Ethical Conduct in Analytics 240 Apply to Business 244 8.6 Summary 245 8.7 Exercise Questions 246 8.8 Multiple Choice Questions 247

CHAPTER 3 Data Summarisation and Visualisation

Learning outcomes

After completing chapter 3, you should be able to:

Define the term ‘Data Summarisation’

Understand the concept of ‘Data Visualisation’

Learn data visualisation using Tableau

Learn data visualisation using advanced MS-Excel spreadsheet

“Visualization gives you answers to questions you didn’t know you had.”

-Ben Schneiderman, American computer scientist

3.1 INTRODUCTION

Today, the world is drowning in the sea of data, where data transfers take place every one millionth of a second. As per Statista 2022 report, over 90% data of world has been generated merely, in past two years. This is credited to the ever-increasing size of world internet users, web searches, explosion of smart devices, online services, social media, and digital media etc. Drawing on the same report, there are approximately 22 billion connected smart devices worldwide, and is estimated to reach 50 billion by the year 2030.

Data is a valuable resource for every business across the sectors. On a day-to-day basis, businesses came across diverse types of data like feature - rich data, large - scale data, and high - value data etc. Businesses leverage these information intensive data sets to devise strategies for

their organisations. For example, various leading apparel companies, retail companies, and food companies etc. had gigantic amount of data in the form of customer information, product details, promotional activities and so on. These companies leverage this data to devise customised promotion strategies.

Well timed interpretation and comprehension of the data enables organisations in numerous ways. For instance, expanding customer base, devising business strategies, developing brand loyalty, setting competitive prices, cutting down costs, and improving overall efficiency etc.

Despite, the numerous advantages that a data offers, most of the companies are unable to take the full advantage of their data due to following reasons:

It is practically not feasible to analyse such large volume of data viz. real time data, current data, or historical data. This is where data summarisation and visualisation prove useful. The crisp, insightful, and comprehensive summaries of the data help firm to identify the existing opportunities and device strategies accordingly.

Also, in the realistic world, things are variable. To understand or get an idea of a situation, one needs to take several measurements. After taking these several measurements, they need to summarise the results and make meaningful inferences therefrom.

3.2 DATA SUMMARISATION

Data summarisation refers presenting a compact description of a dataset. In other words, data summarisation is the presentation of a dataset in an easy, informative, and comprehensive manner. It can be contemplated as abridged form of the dataset wherein the data is compressed into smaller sets while maintaining the maximum possible information. Data summarisation is a meticulously performed summary that is obtained from the entire data set and will divulge significant patterns and trends in a clarified manner.

Data summarisation is the foremost step of data mining and helps in choosing an appreciate statistical tool or technique based on the trends put on view by summarisation. Some examples where data summarisation can act as an auxiliary are as following:

A media house ought to find out how effectively its various channels are performing on the basis of certain variables like viewership, number of shows aired, and target audience etc.

The HR manager of a company wants to keep a record of the company’s workforce and monitor them based on various attributes like vacancies, employee turnover, and transfers etc.

A pharmaceutical company wants to keep a track of its MRs (Medical representative) on the basis of variables like targets achieved, coverage, and outreach etc.

A retail company required to test the market sentiment for its newest product based on the data gathered from various social networking accounts of the consumers.

In all the above examples data summarisation tools and techniques come to rescue by providing deep insights of the situation.

DATA SUMMARISATION AND VISUALISATION 63

3.2-1 Types of data summarisation

Based on the statistical operations, there are three ways in which data can be summarised (Figure 1).

Figure 1 : Types of Data Summarisation

Source: Compiled by authors

These are discussed as follows:

1. Based on Centrality

A data can be summarised on the basis of its centrality. Centrality of a data describes the centre or middle value of the data set. In other words, it ascertains one central value around which all other values of a dataset revolve. The other name for centrality is ‘average.’

There several ways to find the centrality of a data. However, the most popular ones are mean, mode and median. These three summarises the distribution of the dataset.

Mean

Mean is used to calculate the numerical average of a dataset. Arithmetic mean is calculated by adding all the values of the given dataset and dividing it by the by number of items therein. The mathematical formula is as follows:

64 BUSINESS ANALYTICS

x = ∑ x n

‘n’ represents ‘number of items’

The following steps are used to calculate mean using MS-Excel:

Step 1: Click on an empty cell

Step 2: Type ‘=AVERAGE (cell range)’ for example: (A1:A15)

Step 3: Press ‘ENTER’ and mean will be displayed.

For example, Calculate the average marks of the students (out of 15) from the below given sample:

12, 13, 15, 12, 10, 13, 14, 12, 10, 12

Step 1: Click on empty cell (Figure 2)

DATA SUMMARISATION AND VISUALISATION 65

Figure 2

Step 2: Type ‘=AVERAGE (B2:B11) (Figure 3)

Figure 3

Step 3: Press ‘ENTER’. Mean of the dataset is 12.3 (Figure 4)

66 BUSINESS ANALYTICS

Figure 4

Mode

Mode refers to the most recurring value in the sample. In other words, it refers to the most frequent number of the given dataset. Mode is comparatively less preferred in statistical analysis. Although it can be calculated for any type of sample, but it is mostly used where the sample size is large or the given values are integers.

The following steps are used to calculate mean using MS-Excel:

Step 1: Click on an empty cell

Step 2: Type ‘=MODE (cell range)’ for example: (C1: C15)

Step 3: Press ‘ENTER’ and MODE will be displayed.

For example, following are marks of 10 students in the class:

12, 13, 15, 12, 10, 13, 14, 12, 10, 12

Calculate its mode.

Step 1: Click on empty cell (Figure 5)

DATA SUMMARISATION AND VISUALISATION 67

Figure 5

Step 2: Type =MODE (B2: B11) (Figure 6)

Step 3: Press ‘ENTER’ MODE= 12 (Figure 7)

68 BUSINESS ANALYTICS

Figure 6 Figure 7

Median

Median refers to the middle value of the series when arranged in ascending or descending order. When the distribution is normal, the mean and median tend to coincide.

The following steps are used to calculate median using MS-Excel:

Step 1: Click on an empty cell

Step 2: Type ‘=MEDIAN (cell range)’ for example: (C1: C15)

Step 3: Press ‘ENTER’ and MEDIAN will be displayed.

For example, following is the marks of 10 students in the class: 12, 13, 15, 12, 10, 13, 14, 12, 10, 12

Step 1: Click on an empty cell (Figure 8)

DATA SUMMARISATION AND VISUALISATION 69

Figure 8

2. Based on Dispersion

The term ‘dispersion’ means ‘spread.’ To elaborate, dispersion means how scattered the sample values are around the mean. It shows the variability present within the

70 BUSINESS ANALYTICS

Step 2: Type =MEDIAN (B2: B11) (Figure 9)

Figure 9

Step 3: Press ENTER. Median is 12 (Figure 10).

Figure 10

given data. If the values are scattered far away from the mean, then the dispersion of the sample is said to be low. While, values closer to average means low dispersion.

Figure 11 : Symmetrical Distribution

Figure 11 depicts two normally distributed samples i.e., symmetrical distributions. However, the principle of dispersion remains the same for any shape of the distribution. Different measures of dispersion are considered for different data distribution.

Various measures of dispersion include the following:

Standard Deviation (SD)

Standard deviation is the most used measure of dispersion. It is used in normally distributed data and shows how spread the values are from the mean. To rephrase, it shows extra small or extra-large values of the data. Thus, gives an understanding of how scattered a data is. It is also known as ‘average deviation’ from mean. The formula for SD is

Here, s represents sample SD

Generally, a sample is taken from a larger population. Thus, a sample standard deviation (s) is estimated in most of the statistical analyses. Also, n-1 is taken as the denominator. However, in case of population SD, also known as ‘True SD,’ standard deviation is divisor is considered as a ‘compensation factor’ as ‘n’ larger and thus closer to the population. In such a case, subtracting 1 from ‘n’ does not affect the result much.

DATA SUMMARISATION AND VISUALISATION 71

s = ∑ ()2 1 xx n

The following steps are used to calculate SD using MS-Excel:

Step 1: Click on an empty cell

Step 2: Type ‘=STDEV.S (cell range)’ for example: (D1: D5)

Step 3: Press ‘ENTER’ and SD will be displayed.

For example, following is the marks of 10 students in the class: 12, 13, 15, 12, 10, 13, 14, 12, 10, 12

Calculate Standard deviation.

Step 1: Click on an empty cell (Figure 12)

72 BUSINESS ANALYTICS

Figure 12

Business Analytics

AUTHOR : H.K. DANGI, GURVEEN KAUR

PUBLISHER : TAXMANN

DATE OF PUBLICATION : APRIL 2024

EDITION : 2024 EDITION

ISBN NO : 9789357786690

NO. OF PAGES : 264

BINDING TYPE : PAPERBACK

Rs. 425 | USD 6

Description

This book emphasises the critical role of data in today’s evolving business landscape. It highlights the increasing complexity of the business environment and the growing demand for professionals adept at analysing data patterns and translating them into actionable strategies.

This book is designed to progressively build the reader’s knowledge in business analytics, from fundamental concepts to specialised techniques and ethical considerations, complete with practical applications and exercises for reinforcement.

The Present Publication is the Latest Edition, focusing on the latest syllabus under UGCF 2022, aligning with the National Education Policy (NEP) adopted by the University of Delhi. This book is authored by Prof. H.K. Dangi and Gurveen Kaur, with the following noteworthy features:

u [Balanced Approach Between Theory and Practice] The book maintains an equilibrium between theoretical knowledge and practical application. It lays a solid theoretical foundation in Business Analytics while also emphasising its practical aspects

u [Real-World Application and Hands-On Learning] Incorporating reallife case studies, hands-on examples, and exercises, the book ensures that students can connect theoretical concepts with their implementation in the real world

u [Educational Journey in Business Analytics] This book offers insights into data-driven decision-making and strategic thinking

The structure of the book is as follows:

u [Learning Outcomes] Every chapter begins with the list of learning outcomes which the readers will achieve after the completion of the chapter

u [Headings/Sub-headings] Chapters are further divided into headings and sub-headings to increase the reader’s comprehension

u [Practice & Discussion Questions] Each chapter contains a series of practice/discussion questions to help the reader review the material

u [Case Studies] are provided at the end of each chapter to help readers implement their learning into hypothetical real-life situations

ORDER NOW