Statistics I

Page 1

Department of Finance

Course Code: 1206

K = 1 + 3.322log10 đ?‘›

35

HM =

đ?‘›

GM =

1 đ?‘Ľ

đ?‘›

�1 . �2 ‌ ��

Prepared by Md. Mazharul Islam (Jony). Roll no: 091541, 3rd Batch. Department of Finance. Jagannath University.

Frequency Polygon

30 25 20 15 10 5 0 2

7

12

17

22

27

đ?’™=

1|page

32

đ?’™đ?&#x;? + đ?’™đ?&#x;? + đ?’™đ?&#x;‘ + . ‌ . ‌ . . + đ?’™đ?’? = đ?’?

đ?’™đ?’Š đ?’?

Department of Finance, Jagannath University

JONY


Statistics-I The word statistics originated from either the Italian word „statista‟ or the German word „statistik‟. Both the word statista and statistik mean political states of a nation. By statistics now we mean quantitative data affected to a marked extent by multiplicity of causes.

Definition of Statistics: It is almost impossible to formulate statistics in a compact definition. Different statisticians have defined statistics from different perspectives. These definitions are to be used in two different meanings. In singular perspective: In singular perspective the meaning of statistics is that statistics suggests those principles, formula and functions through which the calculate subjects are expressed. In plural perspective: In plural perspective statistics means the expression of the calculated affairs of our day to day life. Example: a) The amount of recent per capita (average) expenditure of the official workers of an industry, b) The annual birth rate or death rate of Bangladesh etc.

Statistics: Statistics may be defined as the collection, processing, presentation, analysis and interpretation of numerical data affected by a multiplicity of causes. “The science of statistics is essentially a branch of applied mathematics and may be regarded as mathematics applied to observational data.” - R.A. Fisher, At last we can say that, Statistics is the science which deals with the methods of collecting, classifying, presenting, comparing and interpreting numerical data collected to through some light on any sphere of inquiry.

Definition of Business Statistics: Business statistics is the collection of the numerical data regarding business, presentation, interpretation and analysis. Examples: a) Production statistics.

b) buying and selling statistics.

c) Export and Import statistics etc.

Population: Simply, Population means the aggregate of human individuals in a defined area or region. In statistics, population refers to an aggregate of all individual or items defined on some common characteristics. Examples: a) All the students in a class constitute a population. b) Students of Jagannath University. c) Students of 2nd semester. d) Students of the session 2008-2009.etc A population can be classified into two groups: Finite population: A population having a finite number of units or individuals or items is called a finite population. Example: The population are consisting the students of Jagannath University. 2|page

Department of Finance, Jagannath University

JONY


Infinite population: A population having a infinite number of units or individuals or items is called a infinite population. Example: The population consisting of all possible outcomes (head and tail) in successive tosses of a coin. ***A statistical population need not have anything to do with a human population. Example: Suppose we want to know the average temperature of January-2010. Then the record of temperature from 1st January-2010 to 31st January-2010 will constitute a population. Population Size: Number of individuals/ items in the population. It is usually denoted by capital “N”.

Sample: A representative and considerably small part of the population is called a sample. That is, a sample is a sub-set or portion of the population selected to represent the population. Example: A group of 30 students representing all the students in the class, is called a sample.

Population Sample Sample Size: Number of individuals/ items in the sample. It is usually denoted by small “n”. Variable: A variable is a characteristic often but not always quantitatively measured, containing two or more values or categories that can vary from person to person, place to place and time to time. Examples: a) Gender is a variable which is compared of two categories, Male and Female. b) Family size. c) Price of a commodity. d) Monthly income. e) Height of person. f) Weight of person. etc. Note: Variable values constitute a data. Classification of Variable Variable

Qualitative/ Catagoral

Quantitative/ Numerical

Discrete

continious

Qualitative Variable: The characteristics that can not be expressed in any numerical form but can arrange them according to their quality or attribute are called Qualitative variable. A qualitative variable is one for which numerical measurement is not possible. Examples: a) Hair color. b) Blood group. c) Religion. d) Residence (urban, rural. Etc). 3|page

Department of Finance, Jagannath University

JONY


Quantitative Variable: Quantitative variable is one for which the resulting observations, are numeric and posses a natural ordering. Examples: a) Height. b) Weight. c) Income. d) Expenditure. e)Age. f) Family size. Etc.

Types of Quantitative Variable Discrete Variable: When a variable can assume only the isolated or integral values within a given name, it is called a discrete variable. Here the possible values are not observed on a continuous scale. It indicates that the values or the observation of the specified range are finite in number/ countable. Examples: a) Number of children in a family. b) Number of accidents per day in a city. *** The discrete variable can not always take values 0, 1, 2, 3,…..; The important point is that, the possible values of the variable are countable and separated from each other. Example: a) Size of shoes. etc

Continuous variable: A variable is said to be continuous if it can theoretically assume any value within a given range. The range consists of an infinite number of elements or values that is, the values of the variable differ by infinitely small magnitude from each other. Examples: a) Height. b) Weight. c) Age. d) Income. e) Price. Etc.

Constant: A numerical characteristic which does never change or vary its value is termed as constant. Example: The ratio of circumference and diameter of a circle (π). And the value of π = 3.1416 is for all the circles.

Data: Data is the plural form of datum. Data are called raw material of statistics. A collection of numbers or facts that is used as a basis for making conclusion are called data. In other word, Data are the raw, disorganized facts and figures collected from any field of inquiry. Examples: a) The marks obtained by 100 students in a class test. b) Income of 50 people. c) Age of 20 student. Etc.

Source of data: Data are collected from two types of sources. They are, 1. Primary data: The data which are obtained by direct observations from the population or sample is called primary data. The primary data are original in character and not well-organized somehow; primary data are called raw data or original data. The collection of raw data is highly expensive in respective of money, time and labor. 2. Secondary data: The data which are already obtained by some other persons or organizations and are already published or utilized are called secondary data. In different countries, there are two different sources of secondary data, i. Published sources: International publications, Government and Non-government publication, Research institution publication, Journal and Newspaper etc. ii. Unpublished sources: Medical institutions, Hospital etc.

4|page

Department of Finance, Jagannath University

JONY


Scales of measurement: Measurement is the assignment of numbers to objects or event according to the rules. With which is compared for measuring is called measurement scale. Variables can be measured under four levels or scales of measurement. The measurement scales are, i. ii. iii. iv.

Nominal Scale. Ordinal Scale. Interval Scale. Ratio Scale.

Qualitative Data Quantitative Data

Nominal Scale: Levels of measurements which classify data in to mutually exclusive and have no logical order, such that, values are assigned to various for identification only, are called Nominal scale. Examples: a) Religion. b) Marital Status. c) Blood Group. d) Nationality. Etc.

Ordinal Scale: The measurement scale in which numbers are assigned to the categories or variables for identification as well as ranking is call Ordinal scale. Examples: a) Economic Status. b) Level of Education. c) Beauty. d) Merit. Etc.

Interval Scale: The measurement scale in which numbers are assigned to the variable values in such a way that data can be ranked, differences are meaningful is called Interval scale. Here the „zero‟ values are not absolutely „zero‟. (Zero are meaningless). Examples: a) Temperature. b) I.Q. Score. c) Dates on Calendar. etc

Ratio Scale: The measurement scale in which numbers are assigned to the variable values in such a way that data can be ranked, differences are meaningful and there is a true „Zero‟, is called Ratio scale. The ratios exist between the different units of measure. Example: a) Height. b) Weight. c) Age. d) Income. e) Price f) Expenditure. Etc.

Md. Mazharul Islam (Jony). Roll no: 091541, 3rd Batch. Department of Finance. Jagannath University.

5|page

Department of Finance, Jagannath University

JONY


Presentation of Data: Statistical data can be presented by the following waysa) Classification and Tabulation. b) Graphical Presentation.

Classification and Tabulation: Classification is the process of arranging individuals in groups on classes according to their resemblances similarity or identity. Its purpose is to condense the raw data to display points of similarities and dissimilarities by suppressing irrelevant details. A sample classification is shown as,

Tabulation involves the orderly presentation of numeral facts in tabular form so as to express the main features of their facts. Tabulation is the process of summarizing classified on group data in the form of a table, so that it is easily understood and an investigator is quickly able to locate the desired information. A table is a systematic arrangement of classified data in columns and rows.

Frequency distribution: Frequency distribution is a particular types of tabular. A frequency distribution shows the number of observations from the data set that fall into each of the classes. The most important form of tabulation form a statistical point of view is called a frequency distribution. A set of classes together with the frequencies of occurrence of values in each class in a given set on data presented in a tabular form, is referred to as a frequency distribution. Types of frequency distribution: a) Categorical frequency distribution: A frequency distribution in which the data is only categorical or qualitative either nominal or ordinal. b) Discrete frequency distribution: When the frequency distribution based on the data of a discrete variable, then it is called discrete frequency distribution. In discrete frequency distribution, the distribution can be made in grouped and ungrouped ways. c) Continuous frequency distribution: When the frequency distribution based on the data of a continuous variable, then it is called continuous frequency distribution. In continuous frequency distribution, the distribution can be made in only ungrouped ways.

Construction of frequency distribution from categorical data The variable smoking status is a nominal variable attributable to two categories smokers and nonsmokers. Identifying the workers of BPC as smokers and non-smokers and counting their number in each category. We can construct the following frequency distribution:Smoking Status Number of Workers Smokers 29 Non-smokers 21 Total 50 6|page

Department of Finance, Jagannath University

JONY


Frequency distribution for numerical data Example 1: (For discrete variable) In a survey of 40 families in a village, the number of children per family was recorded and the following data obtained1 0 3 2 1 5 6 2 2 1 0 3 4 2 1 6 3 2 1 5 3 3 2 4 2 2 3 0 2 1 4 5 3 3 4 4 1 2 4 5 Represent the data in the form of discrete frequency distribution. Solution: Frequency distribution of the number of children. Number of Children Tally Marks Frequency Cumulative Frequency 0 3 3 ||| 1 7 10 |||| || 2 10 20 |||| |||| 3 8 28 |||| ||| 4 6 34 |||| | 5 4 38 |||| 6 2 40 || Total 40 Example 2: (For continuous frequency distribution- Inclusive method) In the survey of 50 workers in BPC, The amount of wages per workers was recorded and the following data is obtained92 51 90 56 59 74 96 85 93 72 67 74 96 73 80 50 77 65 70 62 92 68 87 79 94 76 87 74 63 94 68 65 93 90 80 86 86 69 88 89 87 92 79 95 89 97 90 54 83 74 Represent the data in the form of continuous frequency distribution. Solution: Here, Range = Highest value – Lowest value = (97 – 50) = 47. And the number of classes can be determined by the formula of H.A. Sturges: K = 1 + 3.322log10 đ?‘› = 1 + 3.322log50 = 6.64 ≈ 7. (K = Number of Classes, n = Total Number of Observation). Size of class interval/ Class width = (

đ?‘…đ?‘Žđ?‘›đ?‘”đ?‘’ đ??ž

Frequency distribution of the amount of wages.

7|page

47

) = ( )= 6.71 ≈ 7. 7

Md. Mazharul Islam (Jony). Roll no: 091541, 3rd Batch. Department of Finance. Jagannath University.

Department of Finance, Jagannath University

JONY


Wages (in Taka) 50 – 56 57 – 63 64 – 70 71 – 77 78 – 85 86 – 92 93 – 99

Frequency Cumulative Frequency 4 4 |||| 3 7 ||| 7 14 |||| || 8 22 |||| ||| 6 28 |||| | 14 42 |||| |||| |||| 8 50 |||| ||| Total 50 Example 3: (For continuous frequency distribution- Exclusive method) In survey of 50 university student, The weight in K.G. of 50 students was recorded and the following data is obtained42 54 58 42 47

Tally Marks

49 59 45 51 62

44 49 39 64 61

57 52 41 46 32

51 54 43 41 57

46 54 45 42 39

48 40 34 40 41

47 46 51 49 58

56 37 63 50 37

58 48 49 38 41

Represent the data in the form of continuous frequency distribution. Solution: Here, Range = Highest value – Lowest value = (64 – 32) = 32. And the number of classes can be determined by the formula of H.A. Sturges: K = 1 + 3.322log10 đ?‘› = 1 + 3.322log50 = 6.64 ≈ 7. (K = Number of Classes, n = Total Number of Observation). Size of class interval/ Class width = (

đ?‘…đ?‘Žđ?‘›đ?‘”đ?‘’ đ??ž

32

) = ( )= 4.57 ≈ 5. 7

Frequency distribution of the weight in K.G.. Class Interval Tally Marks Frequency (Weight in K.G.)

30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60 60 – 65

|| |||| |||| |||| |||| |||| ||||

|||| | |||| ||| ||| ||

2 5 11 13 8 7 4 50

Cumulative Frequency 2 7 18 31 39 46 50

Relative Frequency(%) 4 10 22 26 16 14 8

Class Interval: The difference between the lower limit and upper limit of any class is called the interval. Example: a) If in a class we have 15 and 10 as the highest value and lowest value respectively. Then the class interval = H – L = 15 – 10 = 5 b) When we have a data set at hand, then we can find out the class interval by the following formula, 8|page

Department of Finance, Jagannath University

JONY


Class interval =

đ??ťâˆ’đ??ż đ?‘–

. (H = Highest value, L = Lowest value, i = Number of classes).

Exclusive method: In exclusive method the highest value of a class interval and the lowest value of its successive class is same. Here in class 20 – 30, the value from 20 to 29.99 is included. But the value from 30 to 39.99 would be included in the interval 30 – 40. Marks 20 – 30 30 – 40 40 – 50 Total

Number if Students 5 8 17 30

Inclusive method: In this method no overlapping is permitted. Here the highest value of any class and the lowest value of its successive class would not the same. For example, Wages (in Taka) 50 – 56 57 – 63 64 – 70 71 – 77 Total

Frequency 4 3 7 8 22

Class frequency: The number of observations corresponding to a particular class is known as the frequency of that class or the class frequency. Frequency Density: Frequency density of a class refers to number of frequency existing in a class interval of class from a frequency distribution. And it is shown as, Frequency density =

đ??šđ?‘&#x;đ?‘’đ?‘žđ?‘˘đ?‘’đ?‘›đ?‘?đ?‘Ś đ?‘œđ?‘“ đ?‘Ž đ?‘?đ?‘’đ?‘&#x;đ?‘Ąđ?‘Žđ?‘–đ?‘› đ?‘?đ?‘™đ?‘Žđ?‘ đ?‘ đ??śđ?‘™đ?‘Žđ?‘ đ?‘ đ?‘–đ?‘›đ?‘Ąđ?‘’đ?‘&#x;đ?‘Łđ?‘Žđ?‘™ đ?‘œđ?‘“ đ?‘Ąđ?‘•đ?‘Žđ?‘Ą đ?‘ đ?‘?đ?‘’đ?‘?đ?‘–đ?‘“đ?‘–đ?‘? đ?‘?đ?‘™đ?‘Žđ?‘ đ?‘

.

Class midpoint: Many times, for mathematical operation we have to find out the class midpoints. The midpoint of a class interval is its central value. The formula to find out midpoint is, Class midpoint =

đ??ťđ?‘–đ?‘”đ?‘•đ?‘’đ?‘ đ?‘Ą đ?‘Łđ?‘Žđ?‘™đ?‘˘đ?‘’ đ?‘œđ?‘“ đ?‘Ąđ?‘•đ?‘’ đ?‘?đ?‘™đ?‘Žđ?‘ đ?‘ −đ??żđ?‘œđ?‘¤đ?‘’đ?‘ đ?‘Ą đ?‘Łđ?‘Žđ?‘™đ?‘˘đ?‘’ đ?‘œđ?‘“ đ?‘Ąđ?‘•đ?‘’ đ?‘?đ?‘™đ?‘Žđ?‘ đ?‘ 2

Open ended class: An open ended class is such a class in which at least one of the lower and higher value is not perfectly defined. For example, a class „less than 40â€&#x; or a class „over 90â€&#x; is open ended class. And these are shown as <40, >90 or 90+. Data arrangement: The arrangement of data either in ascending or descending order of magnitude is called an array. The data array is one of the simplest ways to present data. On advantage of an array is that we can easily find the range, that is, difference between the highest and the lowest value and we can easily divide the data into sections, also we can observe the distance between succeeding values in the data. Example: Data arrangement of average inventory (in days) for convenience stores2.0 3.8 4.1 4.8 3.4 4.0 4.2 4.9 3.4 4.1 4.3 4.9 3.8 4.1 4.7 5.2 Relative frequency distribution: A relative frequency distribution presents frequencies in terms of fractions or percentages. The sum of all relative frequencies equals to 100 %( percent). This is, because a relative frequency distribution pairs each class with its appreciate fraction or percentage of the total data.

9|page

Department of Finance, Jagannath University

JONY


***Raw data: Information before it is arranged and analyzed is called raw data. It is “raw” because it is unprocessed by statistical methods.

Graphical Presentation of Data A graph is a pictorial presentation of the relationship between variables. Many types of graphs are employed in statistics, depending on the nature of the data involved and the purpose for which the graph is intended. Among Different methods of graphical representation are:a) Bar Diagram. b) Pie Chart. (For Qualitative or categorical data.) c) Histogram d) Frequency Polygon or Frequency Curve. e) Cumulative Frequency Polygon/ Curve or Ogive. (For Quantitative or numerical data.) f) Graphs of Time Series or Line Graph.

Bar Diagram: A bar is a thick line whose width is shown merely for attention. They are called onedimensional because it is only the length of the bar that matters and not the width. When the number of observations is large, lines may be drawn instead of bars to economize space. The wide of the bars and the gap between one bar another should be uniform throughout. Bars may be vertical and horizontal. The vertical bar should be preferred because they give better look. Following frequency distribution shows religion of a sample of 50 individualsReligion Muslim Hindu Others Total

Frequency 250 125 125 500

Construction Bar diagram:

Bar Diagram

300 250

250

200 150

125

125

Muslim

100

Hindu

50

Others

0 Muslim

Hindu

Others

Pie Chart: A pie chart is a circle divided into component sectors according to the break- up of components given in percentage. In 360° angle is divided in proportion to the percentages. In circle of a designed size, generally a horizontal line is drawn and the calculated angles for various components are constructed one often another with the help of protractor. Following frequency distribution shows religion of a sample of 50 individuals10 | p a g e

Department of Finance, Jagannath University

JONY


Religion Muslim

Frequency 250

Angle(θ) 250° 360°×500° = 180°

Hindu

125

360°×500° = 90°

Others

125

Total

500

360°×500° = 90° θ = 360°

125° 125°

Construction Pie diagram:

Pie Diagram 125 Muslim 250

Hindu Others

125

Here is given the following frequency distribution, Class Interval 5–9 10 – 14 15 – 19 20 – 24 25 – 29

Frequency 8 29 27 12 4

Show this frequency distribution in Histogram and Frequency polygon.

Histogram: This type of diagrammatic representation is more suited for frequency distributions with continuous class intervals. In this type of distribution the upper limit of a class is the lower limit of the following class. The magnitudes of the class intervals are plotted along the abscissa (Vertical) and the frequencies are measured along with the ordinate according to the chosen scale. A histogram is a series of rectangles each proportional in width to the range of values within a class and proportional in height to the number of items falling in the class. If the classes in the frequency distribution are of equal width, then the vertical bars in the histogram are also of equal width. The height of the bar for each class corresponds to the number of items in the class. The above discussed frequency distribution is used to construct a histogramClass Boundary 4.5 – 9.5 9.5 – 14.5 14.5 – 19.5 19.5 – 24.5 24.5 – 29.5

11 | p a g e

Frequency 8 29 27 12 4

Department of Finance, Jagannath University

JONY


Construction of Histogram: 35 29

30

27

25 20 15

12 8

10

4

5 0 4.5

9.5

14.5

19.5

24.5

29.5

Class Limits: In a class interval the end numbers, 60 and 62, are called class limit. The smaller number is the „lower class limit‟ (60) and the larger number is the „upper class limit‟ (62). Class Boundaries: The class boundaries are obtained by adding the upper limit of one class interval to the lower limit of the next higher class interval and dividing by 2.

Frequency Polygon: A frequency polygon is another way to portray graph, where frequencies on the vertical axis and the values of the variable, measuring on the horizontal axis, and plotting each class frequency. By drawing a dot above its midpoint and connecting the successive dots with straight lines. The above discussed frequency distribution is used to construct a frequency polygonClass Boundary Frequency Midvalue 4.5 – 9.5 8 7 9.5 – 14.5 29 12 14.5 – 19.5 27 17 19.5 – 24.5 12 22 24.5 – 29.5 4 27 Total 80 Construction of Frequency Polygon: 40

Frequency Polygon

30 20 10 0 2

7

12

17

Cumulative Frequency Polygon/ Curve(Ogive):

22

27

32

A graph showing the cumulative frequency less than any upper class boundary plotted against the upper class boundary is called a cumulative frequency polygon or Ogive. In cumulative frequency polygon or curve, cumulative frequencies are on the vertical axis and the class boundaries or „less than‟ classes are on the horizontal axis and plotting each class frequency, by drawing a dot above its „less than‟ class and connecting the successive dots with Straight line or free hand. A graph of frequency distribution is called an Ogive (pronounced “oh-jive). An Ogive is based on cumulative frequency distribution. Let us consider the following frequency distribution. 12 | p a g e

Department of Finance, Jagannath University

JONY


Class Boundary 1.45 – 1.95 1.95 – 2.45 2.45 – 2.95 2.95 – 3.45 3.45 – 3.95 3.95 – 4.45 4.45 – 4.95

Class Less than 1.45 Less than 1.95 Less than 2.45 Less than 2.95 Less than 3.45 Less than 3.95 Less than 4.45 Less than 4.95

Frequency 0 2 1 4 15 10 5 3

Cumulative Frequency 0 2 3 7 22 32 37 40

Construction of Cumulative Frequency Polygon: 50

Cumulative Frequency Polygon

40

37 32

30 22

20 10 0

40

0 1.45

1.95

7

3

2

2.45

2.95

3.45

3.95

4.45

4.95

Construction of Cumulative Frequency Curve:

Cumulative Frequency Curve

50 40

37 32

30 22

20 10 0

40

0 1.45

3

2 1.95

2.45

7 2.95

3.45

3.95

4.45

4.95

Graphs of Time Series or Line Graph: A line graph is particularly useful for numerical data to show time series data. In this type of graph, we have two variables under consideration. A variable is taken along X-axis and other along Y-axis. The variable values are suitably scaled along the axis and all distances are measured from the origin. The independent variable should be taken on X-axis and the dependent variable an Yaxis. The points are plotted and joined by line segments in order. These graphs depict the trend or variability occurring in the data. Following data present the number of students admitted in Jagannath University during last five years. Years 2005 2006 2007 2008 2009 13 | p a g e

Frequency 1650 1750 1870 1920 2050

Department of Finance, Jagannath University

JONY


Construction of Line Graph:

Line Graph 2500 2000 1500 1000 500 0

2005

1870

1750

1650

2006

2007

1920

2008

2050

2009

Central Tendency Definition: The central tendency of a set of data can be defined as the tendency of clustering of different values around a central value, which is a representative of all other values in the data. The measures of central tendency (also known as „measures of locationsâ€&#x; or „averageâ€&#x;) are the numerical values that locate, in some sense, the center of a set of data. An average is a value that is typical or representative of a set of data. Since such typical values tend to lie centrally within a set of data arranged according to magnitude; average are also called central tendency. Measures of central tendency or location of data are used to identify a typical value that can be used to describe the entire set.

Types of measures of central tendency: The most commonly used measures of central tendency are: a) Mean b) Median

c) Mode

Mean: There are three types of means which are suitable for a particular type of data. They are, Mean

Arithmetic mean

Geometric mean

Harmonic mean

The Arithmetic mean: The „arithmetic meanâ€&#x; is the sum of the values divided by the number of values. It is also popularly known as average. Arithmetic mean is two types1. Simple arithmetic mean 2. Weighted arithmetic mean Simple arithmetic mean: The process of computing mean in case of individual observations is very simple. Add together the various values of the variable and divide the total by number of items. For discrete series, the arithmetic mean or briefly the mean, of a set of ‘n’ numbers đ?‘Ľ1 , đ?‘Ľ2 , đ?‘Ľ3 , ‌ . . đ?‘Ľđ?‘› is denoted by đ?‘Ľ and is defined as symbolically –

đ?‘Ľ=

� 1 + � 2 + � 3 + ‌‌.+ � � �

=

đ?‘› đ?‘–=0 đ?‘Ľđ?‘–

đ?‘›

.

The arithmetic mean or briefly the mean, of a set of ‘N’ numbers đ?‘Ľ1 , đ?‘Ľ2 , đ?‘Ľ3 , ‌ . . đ?‘Ľđ?‘ and its frequencies đ?‘“1 , đ?‘“2 , đ?‘“3,‌‌., đ?‘“đ?‘ is denoted by đ?œ‡ and is defined symbolically – 14 | p a g e

Department of Finance, Jagannath University

JONY


đ?œ‡=

đ?‘Ľ 1 đ?‘“1 + đ?‘Ľ 2 đ?‘“2 + đ?‘Ľ 3 đ?‘“3 + ‌‌.+ đ?‘Ľ đ?‘ đ?‘“đ?‘ đ?‘

=

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘– đ?‘“ đ?‘–

đ?‘

.

In continuous series, arithmetic mean may be computed by applying the following method,

đ?‘Ľ=

đ?‘› đ?‘–=0 đ?‘“ đ?‘– đ?‘š đ?‘–

đ?‘

.

(m= Midvalue of various classes, f= Frequency of each classes, and N= Total frequency). Sample mean: Sample mean is called a statistics (Characteristic or measure obtained from a sample). Population mean: Population mean is called parameter (Characteristic or measure obtained from a Population). Example 1: Suppose we have the following observations: 10, 25, 30, 7, 42, 79 and 83. ∴ Arithmetic mean:

10+25+30+7+42+79+83 7

= 38

Example 2: Calculate the arithmetic mean from the following data: 2 3 4 5 6 7 8 9 10 Student Number 1 Marks Obtained 15 20 25 19 12 11 13 17 18 20 Solution: Student number(đ?’™đ?’Š ) Marks obtained (đ?’‡đ?’Š ) đ?’™đ?’Š đ?’‡đ?’Š 1 15 15 2 20 40 3 25 75 4 19 76 5 12 60 6 11 66 7 13 91 8 17 136 9 18 162 10 20 200 đ?‘Ľđ?‘– = 55 đ?‘“đ?‘– = N = 170 đ?‘Ľđ?‘– đ?‘“đ?‘– = 921 Example 3: Following table gives the wages paid to 125 workers in a factory. Calculate the arithmetic mean of the wages. 200 210 220 230 240 250 260 Wages (Tk.) 5 15 32 42 15 12 4 Number of Workers Solution: Wages (đ?’™đ?’Š ) Number of Workers (đ?’‡đ?’Š ) đ?’™đ?’Š đ?’‡đ?’Š 200 5 1000 210 15 3150 220 32 7040 230 42 9660 240 15 3600 250 12 3000 260 4 1040 đ?‘“đ?‘– = N =125 đ?‘Ľđ?‘– đ?‘“đ?‘– = 28490 ∴ Average Wage =

15 | p a g e

đ?‘Ľđ?‘– đ?‘“đ?‘– đ?‘

=

28490 125

= 227.92 Tk.

Department of Finance, Jagannath University

JONY


Example 4: Find the arithmetic mean from the following frequency distribution. Class Interval Frequency 11 – 15 12 16 – 20 14 21 – 25 13 26 – 30 11 Solution: Class Interval Class Midvalue(đ?’™đ?’Š ) Frequency(đ?’‡đ?’Š ) đ?’™đ?’Š đ?’‡đ?’Š 11 – 15 13 12 156 16 – 20 18 14 252 21 – 25 23 13 299 26 – 30 28 11 308 Total đ?‘Ľđ?‘– = 82 đ?‘“đ?‘– = N = 50 đ?‘Ľđ?‘– đ?‘“đ?‘– = 1015 ∴ Arithmetic Mean: đ?‘Ľ =

xi fi N

=

1015 50

= 20.3

Example 5: A set of data consists of the five values 5, 9, 8, 6 and 10. Find the mean, Solution: The mean of the given set of data is, ∴đ?‘Ľ=

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘–

đ?‘›

=

5+9+8+6+10 5

=

38 5

= 7.6

Example 6: The number of children in 40 families was found as: Number of Children Frequency 0 3 1 7 2 10 3 8 4 6 5 4 6 2 Total 40 Find the mean. Solution: Number of Children(đ?’™đ?’Š ) Frequency(đ?’‡đ?’Š ) đ?’™đ?’Š đ?’‡đ?’Š 0 3 0 1 7 7 2 10 20 3 8 24 4 6 24 5 4 20 6 2 12 Total n = 40 đ?‘Ľđ?‘– đ?‘“đ?‘– = 107 ∴ Arithmetic mean (AM) =

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘– đ?‘“ đ?‘–

đ?‘›

=

107 40

= 5.35

Hence, the mean number of children per family is 5.35 ≈ 6. Weighted Arithmetic mean: Weighted arithmetic mean is commonly used in the construction of index number. The term „weightâ€&#x; stands for the relative importance of the different items. There are situations where the relative importance of the different items is not same. When this is the case, we compute weighted arithmetic mean. Here, the term „weightâ€&#x; stands for the relative 16 | p a g e

Department of Finance, Jagannath University

JONY


importance of different items. We associate with the numbers đ?‘Ľ1 , đ?‘Ľ2 , đ?‘Ľ3 , ‌ . . đ?‘Ľđ?‘› certain weighting factors đ?‘¤1 , đ?‘¤2 , đ?‘¤3 , ‌ . . đ?‘¤đ?‘› depending on the significance or importance attached to the numbers. Symbolically∴đ?‘Ľ=

� 1 � 1 + � 2 � 2 + � 3 � 3 + ‌‌.+ � � � � � 1 + � 2 + � 3 + ‌‌+ � �

=

� �=0 � � � � � �=0 � �

.

Examples 1: The unit prices and the quantity of 3 food items consumed by a family in July 2009 were as follows: Food items Price in taka per kg. Quantity consumed Rice 18 32 Sugar 32 3 Potato 12 9 Compute the average price paid by the family on these items. Solution: Food items Rice Sugar Potato

Price in taka per kg. (�� ) 18 32 12

Hence, The weighted arithmetic mean =

Quantity consumed (đ?’˜đ?’Š ) 32 3 9 đ?‘Ľđ?‘– = 44

� �=0 � � � � � �=0 � �

=

780 44

Total price paid (đ?’˜đ?’Š đ?’™đ?’Š ) 576 96 108 đ?‘¤đ?‘– đ?‘Ľđ?‘– = 780

= 17.73.

The Harmonic Mean (HM): The Harmonic mean is defined as the reciprocal of the arithmetic mean of the reciprocals of the observations. In other word, harmonic mean is the inverse of the arithmetic mean of the reciprocals of the observations of a set. When any observation is „Zeroâ€&#x;, it cannot be calculated. In this mean the number of values divided by the sum of the reciprocals of each value. The harmonic mean is a set of n numbers đ?‘Ľ1 , đ?‘Ľ2 , đ?‘Ľ3 , ‌ . . đ?‘Ľđ?‘› is the reciprocal of the arithmetic mean of the reciprocals of the numbers, Harmonic mean (HM) = For discrete series, HM =

đ?‘ đ?‘› đ?‘“đ?‘– đ?‘–=0 đ?‘Ľ đ?‘–

đ?‘ đ?‘˘đ?‘šđ?‘?đ?‘’đ?‘&#x; đ?‘œđ?‘“ đ?‘Ąđ?‘•đ?‘’ đ?‘Łđ?‘Žđ?‘™đ?‘˘đ?‘’đ?‘ đ?‘†đ?‘˘đ?‘š đ?‘œđ?‘“ đ?‘Ąđ?‘•đ?‘’ đ?‘&#x;đ?‘’đ?‘?đ?‘–đ?‘?đ?‘&#x;đ?‘œđ?‘?đ?‘Žđ?‘™đ?‘ đ?‘œđ?‘“ đ?‘’đ?‘Žđ?‘? đ?‘• đ?‘Łđ?‘Žđ?‘™đ?‘˘đ?‘’

=

đ?‘› 1 1 1 1 + + +â‹Ż..+ đ?‘Ľ1 đ?‘Ľ2 đ?‘Ľ3 đ?‘Ľđ?‘›

=

đ?‘› đ?‘› 1 đ?‘–=0 đ?‘Ľ

.

.

For continuous series or grouped data, HM =

đ?‘ đ?‘› đ?‘“ đ?‘–=0đ?‘š

.

(m= Midvalue of various classes, f= Frequency of each classes, and N= Total frequency). *** Harmonic mean is a suitable measure of central tendency when the data pertains to average speed, rates and time. Example 1: A person drove Uttara from Sador ghat at 40 km per hour and returned at 60km per hour. Find the average speed. Solution: The average speed is obtained by using harmonic mean as: ∴ HM =

17 | p a g e

2 1 1 + 40 60

=48

Department of Finance, Jagannath University

JONY


Example 2: Calculate the harmonic mean from the following data: 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80 Obtained Marks 15 13 8 6 15 Number of student Solution: Table for calculation of harmonic mean: Obtained Marks

Midvalue(�� )

Number of student(�� )

30 – 40 40 – 50 50 – 60 60 – 70 70 – 80 80 – 90 90 – 100

35 45 55 65 75 85 95

15 13 8 6 15 7 6 N =70

∴ Harmonic mean (HM) =

đ?‘ đ?‘› đ?‘“đ?‘– đ?‘–=0đ?‘Ľ đ?‘–

=

70 1.30039

80 – 90 7

90 – 100 6

đ?‘“đ?‘– đ?‘Ľđ?‘– 0.42855 0.28886 0.14544 0.09204 0.20000 0.8232 0.06318 đ?‘“đ?‘– = 1.30039 đ?‘Ľđ?‘–

=53.83

Geometric mean(GM): The Geometric mean of ‘n’ positive values is defined as the đ?‘›đ?‘Ąđ?‘• root of the product of n values. Like arithmetic mean it also depends on all observations. It is affected by the extreme values but not to extent of average. If anyone of the observations is „Zeroâ€&#x; or negative (-) then, for that observations geometric mean is not possible. If there two items, we take the square root; if there are three items, the cube root and so on. Geometric mean is a set of n positive numbers đ?‘Ľ1 , đ?‘Ľ2 , đ?‘Ľ3 , ‌ . . đ?‘Ľđ?‘› is the đ?‘›đ?‘Ąđ?‘• root of the product of the numbers. ∴ Geometric mean (GM) = đ?‘› đ?‘Ľ1 . đ?‘Ľ2 . đ?‘Ľ3 ‌ . . đ?‘Ľđ?‘› When the number of items is three or more the task of multiplying the numbers and of extracting the root becomes excessive by difficult. To simplify calculation logarithms are used. Geometric mean then is calculated as follows, log GM =

đ?‘™đ?‘œđ?‘”đ?‘Ľ 1 + đ?‘™đ?‘œđ?‘” đ?‘Ľ 2 + đ?‘™đ?‘œđ?‘” đ?‘Ľ 3 +â‹Ż..+đ?‘™đ?‘œđ?‘” đ?‘Ľ đ?‘› đ?‘ đ?‘› đ?‘–=0 đ?‘™đ?‘œđ?‘”đ?‘Ľ đ?‘–

And GM = Antilog For discrete series, GM = Antilog For continuous series, GM = Antilog

đ?‘› đ?‘–=0 đ?‘™đ?‘œđ?‘”đ?‘Ľ đ?‘–

đ?‘

đ?‘

đ?‘› đ?‘–=0 đ?‘“ đ?‘– đ?‘™đ?‘œđ?‘”đ?‘Ľ đ?‘–

đ?‘ đ?‘› đ?‘“ đ?‘–=0 đ?‘– đ?‘™đ?‘œđ?‘”đ?‘š đ?‘– đ?‘

=

. .

(m= Midvalue of various classes, f= Frequency of each classes, and N= Total frequency). *** The geometric mean is useful in finding the average of percentages, ratios, indexes, or growth rates. Example 1: Find the geometric mean of 4, 16 and 27. Solution: The geometric mean of the given values is, GM = = 18 | p a g e

đ?‘›

�1 . �2 . �3 ‌ . . ��

3

4 16 27 = 12.

Department of Finance, Jagannath University

JONY


Example 2: If a person receives a 25% raise after one year of service and 15% raise after the second year of service, find average percentage raise per year. Solution: At the end of the first year the salary of the person is (100+25) = 125% At the end of the second year the salary of the person is (100+15) = 115% Using the geometric mean the average percentage of his salary is, GM = 125 Ă— 115 = 119.89% ∴The average percentage raise per year = 119.89% − 100% = 19.89%. Example 3: The growth rate of population of a country in three successive years was 3.0%, 2.5% and 2.1% respectively. Find the average population growth. Solution: At the end of the first year the growth rate = (100+3) = 103% At the end of the second year the growth rate = (100+2.5) =102.5% At the end of the second year the growth rate = (100+2.1) =102.1% And the geometric mean of growth rates is, 3 GM = 103 Ă— 102.5 Ă— 102.1 =102.53% ∴ The average population growth rate per year = 2.53%.

Median (Me): The median of a set of numbers arranged in order of magnitude is either the middle value or the arithmetic mean of the two middle values. The median by definition refers to the middle value in a distribution (i.e. arranged in ascending or descending order). In a set of observations, median is the value of a variable, arranged in increasing order that have half of the number of observations below it and remaining half above it. For discrete series, ** When n is odd,

Me = Size of

đ?‘›+1 2

th item. đ?‘›

đ?‘›

** When n is even, Me = Average of th and ( +1)th observations 2

2

1 đ?‘›

n+2

2 2

2

Or, Me = ( đ?‘Ąđ?‘• score +

đ?‘Ąđ?‘• score).

For continuous series or grouped data, Me = đ??ż0 +

đ?‘ −đ?‘“đ?‘? 2

đ?‘“đ?‘š

Ă—đ?‘–

Where, đ??żđ?‘œ = Lower limit of the class interval containing the median. N = Total number of observations. fc = Cumulative frequency of the pre-median class. fm = Frequency of the median class. i = Width of median class. Note: đ?‘› *** Median class is identified from the cumulative frequency column on the basis of the value of . 2

*** Median cannot be used for Nominal data because ranking is not possible. *** The advantage of using the median as a measure of central tendency is that it is not influenced by outliers. Consequently, the median is preferred over the mean as a measure of central tendency for the data sets that contains outliers. 19 | p a g e

Department of Finance, Jagannath University

JONY


Example 1: Find the median for the following set of data; 16, 32, 20, 13, 13, 24, 10 Solution: Here n = 7, which is odd. Arranging the data in ascending order we have, 10, 13, 13, 16 , 20, 24, 32 ∴ Median = =

đ?‘›+1

th score.

2 7+1 2

th score.

= 4 th score. = 16 Example 2: Find the median for the following set of data; 17, 18, 26, 30, 19, 24, 20, 22, 29, 25 Solution: Here n = 10, which is even. Arranging the data in ascending order we have, 17, 18, 19, 20, 22 , 24 , 25, 26, 29, 30 1 đ?‘›

∴Median = ( đ?‘Ąđ?‘• score +

n+2

2 2 1 10

= ( đ?‘Ąđ?‘• score + 2 1

2

đ?‘Ąđ?‘• score).

2 10+2 2

đ?‘Ąđ?‘• score).

= (5đ?‘Ąđ?‘• score + 6đ?‘Ąđ?‘• score). 2 1

= (22 + 24). 2

= 23 (Ans) Example 3: Find the median for the following data: x 1 2 3 4 Solution: x f 1 5 2 9 3 8 4 6 N = 28 Here N = 28 (even). 1 đ?‘

N+2

2 2 1 28

2 28+2

∴Median = ( đ?‘Ąđ?‘• score + = ( đ?‘Ąđ?‘• score + 2 1

2

2

f 5 9 8 6 cf 5 14 22 28

đ?‘Ąđ?‘• score). đ?‘Ąđ?‘• score).

= (14đ?‘Ąđ?‘• score + 15đ?‘Ąđ?‘• score). 2 1

= (2 + 3). 2

20 | p a g e

Department of Finance, Jagannath University

JONY


= 2.5 (Ans) Example 4: Find the median for the following data: x 11 – 15 16 – 20 21 – 25 26 - 30 Solution: Class Class Boundary 11 – 15 10.5 – 15.5 16 – 20 15.5 – 20.5 21 – 25 20.5 – 25.5 26 - 30 25.5 – 30.5

f 12 14 13 11 f 12 14 13 11 N =50

cf 12 26 39 50

đ?‘

Here, Median = th observation. 2

= 25th observation, which lies in the class 15.5 – 20.5. And Median class = 15.5 – 20.5, đ??żđ?‘œ = 15.5 N = 25 fc = 12 Md. Mazharul Islam (Jony). fm = 14 Roll no: 091541, 3rd Batch. i = 20.5 – 15.5 = 5 ∴ Me = đ??ż0 +

đ?‘ −đ?‘“đ?‘? 2

= 15.5 +

Ă—đ?‘–

đ?‘“đ?‘š 25−12 14

Department of Finance. Jagannath University.

Ă—5

= 15.5 + 4.64 = 20.14 (Ans)

Mode (Mo): The mode of a set of numbers is that value which occurs with greatest frequency, that is, the value that occurs the maximum number of times and it is the most common value. In continuous series or grouped data, Mo = đ??żđ?‘œ +

Δ1 Δ1 + Δ2

Ă—đ?‘?

Where, đ??żđ?‘œ = Lower limit of the modal for which the frequency is maximum. ∆1 = The difference between the frequency of the modal class and the pre-modal class. ∆2 = The difference between the frequency of the modal class and the post-modal class. c = The length of the modal class. Note: A set of data can have more than one mode or no mode at all. If the data is grouped in intervals, then the interval that has highest frequency is called the model class and its midpoint is called the crude mode.

21 | p a g e

Department of Finance, Jagannath University

JONY


The mode is the only measure of central tendency that can be used in finding the most typical case when the data are categorical or nominal. Example 1: Find the mode for the following sets of data: a) 3, 9, 4, 5, 6, 7, 6, 8, 6, 9, 2. b) 3, 5, 8, 10, 12, 15 and 16 c) 45, 55, 50, 45, 40, 55, 45, 55 Solution: a) Since 6 occurs the maximum number of time (3 times), the mode is 6. A distribution having only one mode is called unimodal. b) Since each value occurs only once, there is no mode. But it cannot be said that the mode is „Zero‟. c) Since both 45 and 55 occur the maximum number of times (3 times), the modes are 45 and 55. This set of data is said to be bimodal Example 2: The following data are the weight of 100 persons. Determine the modal weight. Weight 58 60 61 62 63 64 65 66 68 70

Number of person 4 6 5 10 20 22 24 6 2 1

Solution: Mode = 65 kg (since it has maximum frequency). Example 3: Find the mode for the following distribution. Class Frequency 11 – 15 12 16 – 20 14 21 – 25 13 26 – 30 11 Solution: Modal class = 16 – 20 (since it has maximum frequency). The crude mode =

16+20 2

= 18 (midpoint of the modal class).

Example 4: A survey show the following distribution for the number of students chooses the subject in each field. Find the mode. Major Number of student Finance 1550 Accounting 1125 Management 862 Marketing 328 Hotel Management 135 Total 4000 22 | p a g e

Department of Finance, Jagannath University

JONY


Solution: Since the category with highest frequency is „Financeâ€&#x;, the most typical case is „Financeâ€&#x; major.

Quartiles: Quartiles are that values in a set of observations are arranged in order of magnitude which divides the total observation into quarters. That is a quartile divides the data into four equal parts. Let, �1 , �2 , �3 , ‌ . . �� be a series of n observations and they are arranged in order of magnitude, For ungrouped data, ith quartiles �� = And �� =

đ?‘› Ă—đ?‘– đ?‘Ąđ?‘• đ?‘œđ?‘?đ?‘ đ?‘’đ?‘&#x;đ?‘Łđ?‘Žđ?‘Ąđ?‘–đ?‘œđ?‘› + 4

đ?‘›+1 đ?‘– 4

đ?‘Ąđ?‘• observation. (When n is odd).

đ?‘› Ă—đ?‘– +1 đ?‘Ąđ?‘• đ?‘œđ?‘?đ?‘ đ?‘’đ?‘&#x;đ?‘Łđ?‘Žđ?‘Ąđ?‘–đ?‘œđ?‘› 4

(When n is even).

2

For grouped data, đ?‘„đ?‘– = đ??ż0 +

đ?‘ Ă—đ?‘– −đ?‘“đ?‘? 4

Ă—đ?‘? ;

đ?‘“đ?‘ž

i= 1, 2, 3.

Where, đ??żđ?‘œ = Lower limit of the ith quartile class. N = Total number of observations. fc = Cumulative frequency of the pre- ith quartile class. fq = Frequency of the ith quartile class. c = Class interval of the ith quartile class .

Deciles: The deciles of a set of observations are those values which divide the total observations into 10 equal parts. Let, �1 , �2 , �3 , ‌ . . �� be a series of n observations and they are arranged in order of magnitude, �+1 �

For ungrouped data, ith deciles đ??ˇđ?‘– = And đ??ˇđ?‘– =

đ?‘› Ă—đ?‘– đ?‘Ąđ?‘• đ?‘–đ?‘Ąđ?‘’đ?‘š + 10

10

�� observation. [i = 1, 2,‌,9](When n is odd).

đ?‘› Ă—đ?‘– +1 đ?‘Ąđ?‘• đ?‘–đ?‘Ąđ?‘’đ?‘š 10

[i = 1, 2,‌,9] (When n is even).

2

For grouped data, đ??ˇđ?‘– = đ??ż0 +

đ?‘ Ă—đ?‘– −đ?‘“đ?‘? 10

đ?‘“đ?‘‘

Ă—đ?‘? ;

[i = 1, 2,‌,9]

Where, đ??żđ?‘œ = Lower limit of the ith decile class. N = Total number of observations. fc = Cumulative frequency of the pre- ith decile class. fd = Frequency of the ith decile class. c = Class interval of the ith decile class .

Percentiles: The percentiles of a set of data are those values which divide the total observations into 100 equal parts. Let, đ?‘Ľ1 , đ?‘Ľ2 , đ?‘Ľ3 , ‌ . . đ?‘Ľđ?‘› be a series of n observations and they are arranged in order of magnitude, For ungrouped data, ith deciles đ?‘ƒđ?‘– = And đ?‘ƒđ?‘– =

đ?‘› Ă—đ?‘– đ?‘Ąđ?‘• đ?‘œđ?‘?đ?‘ đ?‘’đ?‘&#x;đ?‘Łđ?‘Žđ?‘Ąđ?‘–đ?‘œđ?‘› + 100

100

�� observation. [i = 1, 2,‌,99](When n is odd).

đ?‘› Ă—đ?‘– +1 đ?‘Ąđ?‘• đ?‘œđ?‘?đ?‘ đ?‘’đ?‘&#x;đ?‘Łđ?‘Žđ?‘Ąđ?‘–đ?‘œđ?‘› 100

2

For grouped data, đ?‘ƒđ?‘– = đ??ż0 + 23 | p a g e

đ?‘›+1 đ?‘–

đ?‘ Ă—đ?‘– −đ?‘“đ?‘? 100

đ?‘“đ?‘?

Ă—đ?‘? ;

[i = 1, 2,‌,99] (When n is even).

[i = 1, 2,‌,99]

Department of Finance, Jagannath University

JONY


Where, đ??żđ?‘œ N fc fp c

= Lower limit of the ith percentile class. = Total number of observations. = Cumulative frequency of the pre- ith percentile class. = Frequency of the ith percentile class. = Class interval of the ith percentile class.

Round–off Rule: The measures of central tendency should be rounded to one more decimal place than occurs in the original data. To avoid round–off buildup, round-off only the final answer, not the intermediate steps. Md. Mazharul Islam (Jony).

Roll no:091541, 3rd Batch. Department of Finance. Coding Method: A linear transformation of data may be regarded as coding. In coding we shift the origin and the scale. A change can involve either a change of origin or a change of scale or a change of both, origin and scale together. The effect of coding on mean is given below: 1. If we subtract an arbitrary constant from each of the observation, the mean is also reduced by the constant value. 2. If we divide each observation of a set by an arbitrary constant, the mean is reduced as many times as the constant divisor. It is a very short method and should always be used for grouped data where the class-interval sizes are equal. The calculation can be simplified by taking the deviations of the given values đ?‘Ľđ?‘– from any arbitrary value A (Origin) and dividing by c (Scale) in case of grouped frequency distribution. In the coding method the values of the variable ‘x’ are transformed into the values of the variable ‘u’ according to đ?‘˘đ?‘– =

đ?‘Ľ đ?‘– −đ??´ đ?‘?

, ‌‌‌‌‌ � = A+c�.

Note: In case of addition or multiplication, the word reduced should be replaced by ‘increased’ in the above statements. The above two operations cut short the calculation. But the availability of electronic calculators and computers has diminished the importance of coding of data. Anyhow, it can be used whenever needed.

Properties of a good Average ( Measure of Central Tendency): There are various measures of central tendency. The lies in choosing the measure as no hard and fast rules have been made to select any one. A measure of central tendency is good or satisfactory if it possesses the following characteristics, 1. It should be easy to understand. 2. It should be simple to compute. Md. Mazharul Islam (Jony). 3. It should be based on all the observation. Roll no: 091541, 3rd Batch. 4. It should be rigidly defined. Department of Finance. 5. It should be capable of further algebraic treatment. Jagannath University. 6. It should have sampling stability. 7. It should not be unduly affected by the presence of extreme value. 24 | p a g e

Department of Finance, Jagannath University

JONY


The arithmetic mean is the most popular and widely used measure of central tendency because it meets the first above six properties. From the comparison of measures of central tendency, we can say that arithmetic mean is a good measures of central tendency.

Limitation of arithmetic mean: ďƒź It is highly affected by extreme values. ďƒź It cannot be used for categorical data. ďƒź It cannot be used for open end class of a frequency distribution. ďƒź When the distribution is skewed. Properties of the arithmetic mean: 1. The algebraic sum of the deviations of a set of numbers from their arithmetic mean is „Zeroâ€&#x;. i.e. (đ?‘Ľđ?‘– − đ?‘Ľ ) = 0 2. The sum of the squares of the deviations of a given values from their AM is minimum. i.e. (đ?‘Ľđ?‘– − đ?‘Ľ )2 is minimum.

Pooled or Combined mean: If we have arithmetic mean đ?‘Ľ1 and đ?‘Ľ2 of two groups (having the same unit of measurement of a variable), based on sizes đ?‘›1 and đ?‘›2 observations respectively, we can compute the mean đ?‘Ľ12 of the variable values of the groups taken together from the individual means by the formula, đ?‘Ľ12 =

đ?‘› 1đ?‘Ľ1+ đ?‘› 2 đ?‘Ľ2 đ?‘› 1 +đ?‘› 2

.

The advantage of this formula is that we do not have to do the entire calculations for the mean of the combined set of observations again. Moreover, the formula for two groups can be extended to any number of groups. Example: In a class there are 7 boys and 3 girls. The average age of boys in a firm is 20 years and that of girls is 18 years. Find the average age of all the students.

Solution: The combined age of all student is:

đ?‘Ľđ?‘? =

7Ă—20+3Ă—18 10

=

194 10

= 19.4

Note: *** When any statistical measurement is analyzed from Sample then small ‘n’ is used. *** When any statistical measurement is analyzed from Population data then capital ‘N’ is used.

Relation among Mean, Median and Mode: Karl pearson has expressed the following empirical relationship among Mean, Median and Mode or a moderately Skewed distribution. Mean – Mode = 3[Mean – Median]. or Mode = 3Median – 2Mean. 2

or Median = Mode + [Mean – Mode]. 3

25 | p a g e

Md. Mazharul Islam (Jony). Roll no: 091541, 3rd Batch. Department of Finance. Jagannath University.

Department of Finance, Jagannath University

JONY


Measures of Dispersion Definition of Dispersion/ Variation/ Scattered/ Spread: The degree to which numerical data tend to spread about an average values is called the variation of dispersion of the data. The measurement of the scatter of the values of a data set among themselves is called a measure of dispersion or variation. It tells how the values are dispersed/ spread among them. In other words, the distance of different values from the central value is called dispersion. The measure of dispersion are called average of second order.

Measures of Dispersion: The measures of dispersion can be distinguished by two major categories, a) Absolute measures of dispersion. b) Relative measures of dispersion.

Absolute Measure of Dispersion: When dispersion is measured in original units then it is known as absolute dispersion. The five importance absolute measures of dispersion are as follows: a) Range (R). b) Quarantine Deviation (QD). c) Mean Deviation (MD). d) Variance (� 2 ).

e)

Standard Deviation ( � 2 = S).

The Relative Measure of Dispersion: A relative dispersion is independent of original units. Generally, relative measures of dispersion are expressed in terms of ratio, percentage etc. the relative measures of dispersion are as follows: I. Coefficient of Range (R). II. Coefficient of Quarantine Deviation (QD). III. Coefficient of Mean Deviation (MD). IV. Coefficient of Variance (� 2 ).

Range: Range is the simplest method of studying dispersion. The range is a set of numbers is the difference between the largest and smallest numbers in the set. Range = L – S [L = Largest item, S = Smallest item].

Quarantine Deviation: The quartile deviation is another type of range obtained from the quartiles. The inter-quartile deviation is obtained by dividing the difference between upper quartile (đ?‘„3 ) and lower quartile (đ?‘„1 ) by 2. Here, Inter-quartile range represents the difference between the third quartile and the first quartile. In other words, Inter-quartile is the range which includes the middle 50 percent of the observation. Symbolically, Inter-quartile range = đ?‘„3 − đ?‘„1 The inter-quartile range is reduced to the form of the semi inter- quartile range or quartile deviation by dividing it by 2. Symbolically, Semi Inter-quartile range or Quartile Deviation = 1

∴Quartile deviation defined as = QD = (đ?‘„3 − đ?‘„1 ) 2

26 | p a g e

đ?‘„3 −đ?‘„1 2

.

[âˆľ đ?‘„1 = 1st Quartile, đ?‘„2 = 2nd Quartile]

Department of Finance, Jagannath University

JONY


Rules for calculating đ?‘¸đ?&#x;? đ?’‚đ?’?đ?’… đ?‘¸đ?&#x;‘ : đ?‘›

1

đ?‘›

đ?‘›

4

2

4

4

a) If n is divisible by 4 ( is a integer), then đ?‘„1 = { th + ( +1)th} ordered value. đ?‘›

b) If n is not exactly divisible by 4 ( is not a integer), then đ?‘„1 has the value of the next higher 4

integer. đ?‘›

3đ?‘›

4

4

c) For đ?‘„3 , only replace by

.

Note: *** Quartile deviation must be arranged in ascending order. For ungrouped data: Example: Calculate Quartile deviation from the following raw data: 32 14 19 17 23 40 80 54 59 27 71 48 Solution: Arrangement of data values in ascending: 14 17 19 23 27 32 40 48 54 59 71 80 Here n = 12. đ?‘›

12

4

4

∴ =

= 3 [Which is an integer]

∴ đ?‘„1 = Average of 3rd and 4th observation. =

19+23

For đ?‘„3 =

2 3đ?‘› 4

= 21.

= 3Ă—

12 4

= 9.

∴ đ?‘„3 = Average of 9th and 10th observation. =

54+59 2

= 56.5

1

∴ QD = (đ?‘„3 − đ?‘„1 ) 2

1

= (56.5 – 21) =17.75 Ans. 2

Mean Deviation: Mean deviation is obtained by calculating the absolute deviations of each observation from mean or median or mode and then averaging these deviations by taking their arithmetic mean. Rules for Calculating Mean Deviation: Let, �1 , �2 , �3 , ‌ . . �� be a series of n observations, then the mean deviation can be expressed as: Mean deviation from mean, For ungrouped data, 1) M.D.(� ) = For grouped data,

2) M.D.(đ?‘Ľ ) =

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘Ľ

;

đ?‘› đ?‘› đ?‘–=0 đ?‘“ đ?‘– đ?‘Ľ đ?‘– −đ?‘Ľ đ?‘›

[đ?‘Ľ = Arithmetic mean.] đ?‘› đ?‘–=0 đ?‘“đ?‘– =

; [

n]

Mean deviation from median, For ungrouped data, 1)M.D.(Me) = For grouped data,

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘€đ?‘’

2) M.D.(Me) =

đ?‘›

;

đ?‘› đ?‘–=0 đ?‘“ đ?‘– đ?‘Ľ đ?‘– −đ?‘€đ?‘’

đ?‘›

[Me = Median] ; [

đ?‘› đ?‘–=0 đ?‘“đ?‘– =

;

[Mo = Mode]

n]

Mean deviation from mode, For ungrouped data, 1)M.D.(Mo) = For grouped data, 27 | p a g e

2) M.D.(Mo) =

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘€đ?‘œ

đ?‘› đ?‘› đ?‘“ đ?‘–=0 đ?‘– đ?‘Ľ đ?‘– −đ?‘€đ?‘œ đ?‘›

;[

đ?‘› đ?‘–=0 đ?‘“đ?‘– =

n]

Department of Finance, Jagannath University

JONY


Variance: Variance is the square of standard deviation. In other words, Variance is the arithmetic mean of the squared deviations from the mean of the distribution. Variance is generally denoted by„đ?œŽ 2 â€&#x; (sigma square) for population data and „đ?‘† 2 â€&#x; for sample data Rules for Calculating Variance: Let, đ?‘Ľ1 , đ?‘Ľ2 , đ?‘Ľ3 , ‌ . . đ?‘Ľđ?‘› be a series of n observations, then the variance can be expressed as: For ungrouped data, đ?‘† 2 = đ?‘†2 =

For grouped data,

đ?‘› 2 đ?‘–=0(đ?‘Ľ đ?‘– −đ?‘Ľ )

=

đ?‘› đ?‘› đ?‘“ đ?‘–=0 đ?‘– (đ?‘Ľ đ?‘– −đ?‘Ľ ) đ?‘›

đ?‘› 2 đ?‘–=0 đ?‘Ľ

.;[

đ?‘›

−

đ?‘› đ?‘–=0 đ?‘“đ?‘– =

2 đ?‘› đ?‘–=0 đ?‘Ľ đ?‘›

.

n]

And If population data is used then, đ?‘› đ?‘–=0 đ?‘“ đ?‘– (đ?‘Ľ đ?‘– −đ?œ‡ )

đ?œŽ2 =

đ?‘

.

Standard Deviation: The standard deviation is the positive square root of the mean of the squared deviations from their mean of a set of values. The standard deviation is the most important measure of dispersion. It is an improvement over the mean deviation. Standard deviation is also known as root mean square deviation for the reason that it is the square root of the means of square deviations from the arithmetic mean. It is generally denoted by small Greek letter „đ?œŽâ€&#x; (sigma) for population data and „Sâ€&#x; for sample data and expressed by, For ungrouped data, For grouped data,

đ?‘› (đ?‘Ľ −đ?‘Ľ )2 đ?‘–=0 đ?‘–

S= S=

đ?‘› đ?‘› đ?‘“ (đ?‘Ľ −đ?‘Ľ )2 đ?‘–=0 đ?‘– đ?‘–

đ?‘›

Example: Find mean deviation, variance and standard deviation for the following data: 5 8 10 12 15 16 20 22 24 25 Solution: đ?‘Ľđ?‘– đ?‘Ľđ?‘– 2 đ?‘Ľđ?‘– − đ?‘Ľ 5 25 10.7 8 64 7.7 10 100 5.7 12 144 3.7 15 225 0.7 16 256 0.3 20 400 4.3 22 484 6.3 24 576 8.3 25 625 9.3 đ?‘› đ?‘› 2 đ?‘› đ?‘–=0 đ?‘Ľđ?‘– = 157 đ?‘–=0 đ?‘Ľđ?‘– − đ?‘Ľ = 57 đ?‘–=0 đ?‘Ľđ?‘– = 2899 đ?‘› đ?‘–=0 đ?‘Ľ đ?‘–

Where, đ?‘Ľ = ∴ MD = ∴ đ?‘†2 = 28 | p a g e

đ?‘› đ?‘› đ?‘Ľ đ?‘–=0 đ?‘– −đ?‘Ľ đ?‘› đ?‘› 2 đ?‘–=0 đ?‘Ľ

đ?‘›

−

= =

157 10 57 10

= 15.7

= 5.7 (Ans.)

2 đ?‘› đ?‘–=0 đ?‘Ľ đ?‘›

=

2899 10

−

157 2 10

= 289.9 −

24649 100

= 289.9 – 246.49 = 43.41 (Ans.)

Department of Finance, Jagannath University

JONY


∴ S=

đ?‘› (đ?‘Ľ −đ?‘Ľ )2 đ?‘–=0 đ?‘–

đ?‘›

= 43.41 = 6.59(Ans.)

Example: The lengths of 32 leaves were measured correct to the nearest mm. Find the variance and standard deviation. Lengths 20 – 22 23 – 25 26 – 28 29 – 31 32 – 34 Frequency 3 6 12 9 2 Solution: Lengths 20 – 22 23 – 25 26 – 28 29 – 31 32 – 34

đ?‘“đ?‘– 3 6 12 9 2 đ?‘› đ?‘–=0 đ?‘“đ?‘– = 32

Midvalue (đ?‘Ľđ?‘– ) 21 24 27 30 33 đ?‘› đ?‘–=0 đ?‘Ľđ?‘– = 135 2

∴ Variance, đ?‘† =

đ?‘› 2 đ?‘–=0 đ?‘Ľ

đ?‘›

−

∴ Standard deviation, S =

2 đ?‘› đ?‘–=0 đ?‘Ľ đ?‘›

=

23805

đ?‘› (đ?‘Ľ −đ?‘Ľ )2 đ?‘–=0 đ?‘–

đ?‘›

32

−

867 2 32

đ?‘“đ?‘– đ?‘Ľđ?‘– 63 144 324 270 66 đ?‘› đ?‘–=0 đ?‘“đ?‘– đ?‘Ľđ?‘– =867

đ?‘“đ?‘– đ?‘Ľđ?‘– 2 1323 3456 8748 8100 2178 đ?‘› 2 đ?‘–=0 đ?‘“đ?‘– đ?‘Ľđ?‘– = 23805

= 743.9 – 734.07 = 9.83 �� 2 (Ans.)

= 9.83 = 3.315 mm. (Ans.)

Coefficient of variation: The coefficient of variation of a series of variate values is the ratio of the standard deviation to the mean multiplied by 100. The coefficient of variance express the SD (Standard deviation) as a percentage of the arithmetic mean, it is a relative measure of dispersion. The coefficient of variation is denoted by C.V. and if SD or (đ?œŽ) is the standard deviation and đ?‘Ľ is the mean of the set of values, then the coefficient of variation is, đ?œŽ

C.V. = Ă— 100% đ?‘Ľ

Note: a) In case of sample studies, we use ‘S’, the sample SD and đ?‘Ľ , the sample mean instead of ′đ?œŽâ€˛ and đ?‘Ľ respectively. b) The coefficient of variation is independent of the units used. For this reason, it is useful in comparing distributions where the units may be different. The disadvantage of the coefficient of variation is that it fails to be useful where đ?‘Ľ is close to ‘Zero’. Example: The distribution of age at the marriage of grooms with brides of age group 15-19 is displayed here. Age groups (years) 15 – 19 19 – 23 23 – 27 27 – 31 31 – 35 35 – 39 Number of groups 8 59 47 23 6 4 Find MD, đ?‘† 2 , S and CV .

29 | p a g e

Department of Finance, Jagannath University

JONY


Solution: Class interval

Midvalue (đ?‘Ľđ?‘– ) 17 21 25

15 – 19 19 – 23 23 – 27 27 – 31 31 – 35 35 – 39

Frequency (đ?‘“đ?‘– ) 8 59 47

29 33 37 đ?‘› đ?‘–=0 đ?‘Ľđ?‘– = 162

Where, đ?‘Ľ =

đ?‘› đ?‘–=0 đ?‘“ đ?‘– đ?‘Ľ đ?‘–

đ?‘› đ?‘› đ?‘“ đ?‘–=0 đ?‘– đ?‘Ľ đ?‘– −đ?‘Ľ

∴ đ?‘€đ??ˇđ?‘Ľ =

đ?‘›

∴ Variance, đ?‘† 2 =

23 6 4 đ?‘› đ?‘–=0 đ?‘“đ?‘– = n=147

=

3563

=

đ?‘“đ?‘– đ?‘Ľđ?‘– 2

đ?‘“đ?‘– (đ?‘Ľđ?‘– − đ?‘Ľ )2

đ?‘“đ?‘– đ?‘Ľđ?‘– − đ?‘Ľ

136 1239 1175

2312 26019 29375

419.3408 619.3584 27.1472

57.92 191.16 35.72

19343 6534 5476

521.1248 460.4256 651.2704

667 198 148 đ?‘› đ?‘–=0 đ?‘“đ?‘– đ?‘Ľđ?‘– = 3563

đ?‘› 2 đ?‘–=0 đ?‘“đ?‘– đ?‘Ľđ?‘– =

đ?‘› đ?‘–=0 đ?‘“đ?‘– (đ?‘Ľđ?‘–

89059

− đ?‘Ľ )2 = 2698.6672

109.48 52.56 51.04 đ?‘› đ?‘–=0 đ?‘“đ?‘– đ?‘Ľđ?‘– − đ?‘Ľ = 497.88

= 24.24

147 497.88 147

đ?‘› 2 đ?‘–=0 đ?‘“ đ?‘– đ?‘Ľ

đ?‘›

= 3.3869(Ans.)

−

∴ Standard deviation, S = đ?‘†

4.28

đ?‘Ľ

24.24

∴ C.V. = Ă— 100% =

đ?‘“đ?‘– đ?‘Ľđ?‘–

2 đ?‘› đ?‘–=0 đ?‘“ đ?‘– đ?‘Ľ đ?‘– đ?‘› đ?‘› đ?‘“ (đ?‘Ľ −đ?‘Ľ )2 đ?‘–=0 đ?‘– đ?‘–

đ?‘›

=

89059 147

−

3563 2 147

= 605.8435−587.4853 = 18.3582(Ans.)

= 18.3582 = 4.28(Ans.)

Ă— 100% = 17.67 %( Ans.)

Empirical Relation between Measures of Dispersion: For moderately skewed distributions, we have the empirical formula, Mean deviation

4

= (Standard deviation) 5 2

Quartile deviation = (Standard deviation) 3

These are consequences of the fact that for the normal distribution, we find the mean deviation and quartile deviation range are equal, respectively, to 0.7979 and 0.6745 times the standard deviation.

Properties of a good Spread (Measures of dispersion): There are various measures of dispersion. The lies in choosing the measure as no hard and fast rules have been made to select any one. A measure of dispersion is good or satisfactory if it possesses the following characteristics, 1. It should be easy to understand. 2. It should be simple to compute. 3. It should be based on all the observation. 4. It should be rigidly defined. 5. It should be capable of further algebraic treatment. 6. It should have sampling stability. 7. It should not be unduly affected by the presence of extreme value. The standard deviation is the most popular and widely used measure of dispersion because it meets above six properties. It has great practical utility in sampling and statistical interface. From the comparison of measures of dispersion, we can say that standard deviation is a good measure of dispersion. 30 | p a g e

Department of Finance, Jagannath University

JONY


moments Definition of moments Moments are popularly known used to describe the characteristics of a distribution. A set of constant or descriptive measures which can characterize a set of values or observations uniquely is called moments. They represent a convenient and unifying method for summarizing many of the most commonly used descriptive statistical measures such as central tendency, variation etc. The Greek letter đ?œ‡ (read as mu) is generally used to denote the moments. Generally moments are of two kinds,

a) Central or corrected moments: When the moments are computed from the arithmetic mean of the distribution, then the moments are called central moments. For ungrouped data: If đ?‘Ľ1 , đ?‘Ľ2 , đ?‘Ľ3 , ‌ . . đ?‘Ľđ?‘› be a data set of n values then đ?‘&#x; đ?‘Ąđ?‘• corrected moment is the moments about their mean đ?‘Ľ , and defined as, đ?œ‡đ?‘&#x; =

đ?‘› đ?‘&#x; đ?‘–=0(đ?‘Ľ đ?‘– −đ?‘Ľ )

đ?‘›

r =(1, 2, 3, 4, ‌‌‌)

;

Now putting r = 1, 2, 3, 4 we have 1st central moments đ?œ‡1 = 2nd central moments đ?œ‡2 = 3rd central moments đ?œ‡3 = 4th central moments đ?œ‡4 =

đ?‘› 1 đ?‘–=0(đ?‘Ľ đ?‘– −đ?‘Ľ )

đ?‘› đ?‘› 2 đ?‘–=0(đ?‘Ľ đ?‘– −đ?‘Ľ )

=

đ?‘› đ?‘› đ?‘–=0 đ?‘Ľ đ?‘– − đ?‘–=0 đ?‘Ľ

đ?‘›

=

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘–

đ?‘›

đ?‘› đ?‘–=0 đ?‘Ľ

−

đ?‘›

=đ?‘Ľâˆ’

�� �

=đ?‘Ľâˆ’đ?‘Ľ=0

=đ?œŽ 2 = variance.

đ?‘› đ?‘› 3 đ?‘–=0(đ?‘Ľ đ?‘– −đ?‘Ľ ) đ?‘› đ?‘› 4 đ?‘–=0(đ?‘Ľ đ?‘– −đ?‘Ľ )

đ?‘›

These four moments are known as the first four corrected or central moments. For grouped data: If the values of đ?‘Ľ1 , đ?‘Ľ2 , đ?‘Ľ3 , ‌ . . đ?‘Ľđ?‘› are repeated đ?‘“1 , đ?‘“2 , đ?‘“3,‌‌., đ?‘“đ?‘› respectively , then the đ?‘&#x; đ?‘Ąđ?‘• central moments can be written as, đ?œ‡đ?‘&#x; =

đ?‘› đ?‘&#x; đ?‘–=0 đ?‘“ đ?‘– (đ?‘Ľ đ?‘– −đ?‘Ľ )

đ?‘›

r =(1, 2, 3, 4, ‌‌‌)

;

And all terms are as like as ungrouped data. Just use đ?‘“đ?‘– for each central moments.

b) Raw moments: Sometimes we use arbitrary value „Aâ€&#x; other than the arithmetic mean đ?‘Ľ to define moments, then the moments are called raw moments. For ungrouped data: If đ?‘Ľ1 , đ?‘Ľ2 , đ?‘Ľ3 , ‌ . . đ?‘Ľđ?‘› be a data set of n values then đ?‘&#x; đ?‘Ąđ?‘• raw moment about an arbitrary value „Aâ€&#x; is defined as, đ?œ‡đ?‘&#x;′ =

đ?‘› đ?‘&#x; đ?‘–=0(đ?‘Ľ đ?‘– −đ??´)

đ?‘›

r =(1, 2, 3, 4, ‌‌‌)

;

Now putting r = 1, 2, 3, 4 we have 1st raw moments đ?œ‡1′ = 2nd raw moments đ?œ‡2′ = 31 | p a g e

đ?‘› 1 đ?‘–=0 (đ?‘Ľ đ?‘– −đ??´)

=

đ?‘› đ?‘› 2 đ?‘–=0 (đ?‘Ľ đ?‘– −đ??´) đ?‘›

đ?‘› đ?‘› đ?‘–=0 đ?‘Ľ đ?‘– − đ?‘–=0 đ?‘Ľ

đ?‘›

=

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘–

đ?‘›

−

đ?‘› đ?‘–=0 đ??´

đ?‘›

=đ?‘Ľâˆ’

đ?‘›đ??´ đ?‘›

=đ?‘Ľâˆ’đ??´

.

Department of Finance, Jagannath University

JONY


đ?‘› 3 đ?‘–=0(đ?‘Ľ đ?‘– −đ??´)

3rd raw moments đ?œ‡3′ =

đ?‘› đ?‘› 4 đ?‘–=0(đ?‘Ľ đ?‘– −đ??´)

4th raw moments đ?œ‡4′ =

đ?‘›

.

.

These four moments are known as the first four corrected or central moments. For grouped data: If the values of đ?‘Ľ1 , đ?‘Ľ2 , đ?‘Ľ3 , ‌ . . đ?‘Ľđ?‘› are repeated đ?‘“1 , đ?‘“2 , đ?‘“3,‌‌., đ?‘“đ?‘› respectively , then the đ?‘&#x; đ?‘Ąđ?‘• raw moments can be written as, đ?‘› đ?‘&#x; đ?‘–=0 đ?‘“ đ?‘– (đ?‘Ľ đ?‘– −đ?‘Ľ )

đ?œ‡đ?‘&#x;′ =

đ?‘›

r =(1, 2, 3, 4, ‌‌‌)

;

And all terms are as like as ungrouped data. Just use đ?‘“đ?‘– for each central moments.

Moments about Origin: The raw moments about origin is : If A = 0 then đ?›žđ?‘&#x; =

đ?‘› đ?‘&#x; đ?‘–=0 đ?‘Ľ đ?‘–

[here � is read as neu)

đ?‘›

1st moment about origin, by definition, �1 =

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘–

đ?‘›

=đ?‘Ľ

1st raw moment about origin is equal to Arithmetic mean. 1st four central moments in terms of raw moments: Let đ?‘Ľ1 , đ?‘Ľ2 , đ?‘Ľ3 , ‌ . . đ?‘Ľđ?‘› be the set of n observations with arithmetic mean đ?‘Ľ and „Aâ€&#x; be any arbitrary value, Where (đ?‘Ľ ≠đ??´) and four raw moments are, đ?œ‡1′ =

đ?‘› 1 đ?‘–=0 (đ?‘Ľ đ?‘– −đ??´)

đ?‘›

đ?œ‡2′ =

=

đ?‘› đ?‘› đ?‘–=0 đ?‘Ľ đ?‘– − đ?‘–=0 đ?‘Ľ

đ?‘› 2 đ?‘–=0 (đ?‘Ľ đ?‘– −đ??´)

đ?‘›

đ?‘›

=

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘–

đ?‘›

đ?‘› đ?‘› 3 (đ?‘Ľ −đ??´) đ?‘–=0 đ?‘–

đ?œ‡3′ =

.

∴ 1st central moment đ?œ‡1 = ∴ 2nd central moments đ?œ‡2 =

đ?‘› 1 đ?‘–=0(đ?‘Ľ đ?‘– −đ?‘Ľ )

đ?‘›

=

đ?‘› đ?‘› 2 đ?‘–=0(đ?‘Ľ đ?‘– −đ?‘Ľ ) đ?‘›

đ?‘› đ?‘–=0 đ??´

−

= = = =

=đ?‘Ľâˆ’

∴ đ?œ‡2 ∴ 3rd central moments đ?œ‡3 = = = 32 | p a g e

= đ?‘Ľ − đ??´,

đ?‘›

đ?œ‡4′ =

.

đ?‘› đ?‘› đ?‘–=0 đ?‘Ľ đ?‘– − đ?‘–=0 đ?‘Ľ

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘–

=

đ?‘› đ?‘› (đ?‘Ľ −đ??´âˆ’đ?‘Ľ +đ??´)2 đ?‘– đ?‘–=0

đ?‘›

−

đ?‘› 4 đ?‘–=0 (đ?‘Ľ đ?‘– −đ??´)

đ?‘› đ?‘› đ?‘–=0 đ?‘Ľ

đ?‘›

.

=đ?‘Ľâˆ’

�� �

=đ?‘Ľâˆ’đ?‘Ľ=0

đ?‘› đ?‘› 2 đ?‘–=0 { đ?‘Ľ đ?‘– −đ??´ −(đ?‘Ľ −đ??´)}

đ?‘› đ?‘› 2 đ?‘–=0 (đ?‘Ľ đ?‘– −đ??´) −2 đ?‘› 2 đ?‘–=0 (đ?‘Ľ đ?‘– −đ??´)

đ?‘›

đ?‘› đ?‘› 2 đ?‘–=0 đ?‘Ľ đ?‘– −đ??´ (đ?‘Ľ −đ??´)+ đ?‘–=0(đ?‘Ľ −đ??´)

đ?‘›

−2

= đ?œ‡2′ − 2đ?œ‡1′ đ?œ‡1′ + = = =

đ?‘›đ??´

đ?œ‡2′ − 2(đ?œ‡1′ )2 đ?œ‡2′ − 2(đ?œ‡1′ )2 đ?œ‡2′ − (đ?œ‡1′ )2 .

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘– −đ??´

đ?‘›

(đ?‘Ľ − đ??´) +

đ?‘› 2 đ?‘–=0(đ?‘Ľ −đ??´)

đ?‘› ′2 đ?‘–=0 đ?œ‡ 1

đ?‘›

đ?‘› đ?‘›(đ?œ‡ 1′ )2

+ đ?‘› + (đ?œ‡1′ )2

đ?‘› 3 đ?‘–=0 (đ?‘Ľ đ?‘– −đ?‘Ľ )

đ?‘› đ?‘› 3 đ?‘–=0(đ?‘Ľ đ?‘– −đ??´âˆ’đ?‘Ľ +đ??´)

.

đ?‘› đ?‘› { đ?‘Ľ −đ??´ −(đ?‘Ľ −đ??´)}3 đ?‘– đ?‘–=0 đ?‘›

.

Department of Finance, Jagannath University

JONY


𝑛 3 𝑖=0 𝑥 𝑖 −𝐴 −3

=

𝑛 3 𝑖=0(𝑥 𝑖 −𝐴)

=

𝑛

𝑛 2 𝑥 −𝐴 +3 𝑖=0 𝑥 𝑖 −𝐴

−3

𝑛 2 𝑖=0 𝑥 𝑖 −𝐴

𝑛

.

𝑛 𝑛(𝑥 −𝐴) 𝑛

= 𝜇3′ − 3 𝜇2′ 𝜇1′ + 3(𝜇1′ )3 − = 𝜇3′ − 3 𝜇2′ 𝜇1′ + 2(𝜇1′ )3 .

∴ 𝜇3

𝑛 4 𝑖=0 (𝑥 𝑖 −𝑥 )

∴4th central moments 𝜇4 =

=

𝑛

.

𝑛(𝑥 −𝐴)2 𝑛

.

𝑛

.

𝑛

.

.

− 4 𝜇3′ 𝜇1′ − 4 𝜇3′ 𝜇1′

+ 6𝜇2′ (𝜇1′ )2 + 6𝜇2′ (𝜇1′ )2

− 4(𝜇1′ )4 − 3(𝜇1′ )4

.

𝑛

= 𝜇4′ − 4 𝜇3′ 𝜇1′ + 6𝜇2′ (𝜇1′ )2 − 4(𝜇1′ )4 + 𝜇4′ 𝜇4′

.

.

+6

𝑛

𝑛

.

𝑛 𝑛 𝑛 4 3 𝑥 −𝐴 +6 𝑛𝑖=0 𝑥 𝑖 −𝐴 2 𝑥 −𝐴 2 −4 𝑛𝑖=0 𝑥 𝑖 −𝐴 𝑥 −𝐴 3 + 𝑛𝑖=0 𝑥 −𝐴 4 𝑖=0 𝑥 𝑖 −𝐴 −4 𝑖=0 𝑥 𝑖 −𝐴 . 𝑛 𝑛 𝑛 𝑛 𝑛 4 3 2 2 3 𝑛(𝑥 −𝐴) 𝑛(𝑥 −𝐴) 𝑖=0(𝑥 𝑖 −𝐴) 𝑖=0 𝑥 𝑖 −𝐴 𝑖=0 𝑥 𝑖 −𝐴 𝑖=0 𝑥 𝑖 −𝐴 𝑛(𝑥 −𝐴)

−4

𝑛 3 𝑖=0 (𝑥 −𝐴)

𝑛 (𝜇1′ )3 .

𝑛

𝑛 ′ 4 𝑖=0(𝜇 1 )

= 𝜇4′ − 4 𝜇3′ 𝜇1′ + 6𝜇2′ (𝜇1′ )2 − 4𝜇1′ (𝜇1′ )3 +

= ∴ 𝜇4 =

𝑛(𝜇 1′ )3

𝑛 4 𝑖=0 (𝑥 𝑖 −𝐴−𝑥 +𝐴)

𝑛 𝑛 4 { 𝑖=0 𝑥 𝑖 −𝐴 −(𝑥 −𝐴)}

=

=

=

𝑛

𝑥 −𝐴 2 − 𝑛𝑖=0 𝑥 −𝐴 3

𝑛 𝑖=0 𝑥 𝑖 −𝐴

+3

𝑛 ′ 3 𝑖=0(𝜇 1 )

= 𝜇3′ − 3 𝜇2′ 𝜇1′ + 3𝜇1′ (𝜇1′ )2 − = 𝜇3′ − 3 𝜇2′ 𝜇1′ + 3(𝜇1′ )3 −

𝑛 𝑖=0 𝑥 𝑖 −𝐴

𝑛

−4

𝑛

.

𝑛

𝑛

+

𝑛 4 𝑖=0(𝑥 −𝐴)

𝑛

.

.

𝑛(𝜇 1′ )4

+

𝑛 (𝜇1′ )4

Note: The formula is used for this moment: (𝑎 − 𝑏)4 = 𝑎4 − 4𝑎3 𝑏 + 6𝑎2 𝑏2 − 4𝑎𝑏3 + 𝑏4

1st four raw moments in terms of central moments: Let 𝑥1 , 𝑥2 , 𝑥3 , … . . 𝑥𝑛 be the set of n observations with arithmetic mean 𝑥 and „A‟ be any arbitrary value, Let 𝑥 − 𝐴 = S. Where (𝑥 ≠ 𝐴) and four central moments are, 𝜇1 =

𝑛 1 𝑖=0 (𝑥 𝑖 −𝑥 )

𝑛

=

𝜇2 =

𝑛 𝑛 𝑖=0 𝑥 𝑖 − 𝑖=0 𝑥

𝑛 𝑛 2 (𝑥 𝑖=0 𝑖 −𝑥 ) 𝑛

∴1st raw moments 𝜇1′ = ∴2nd raw moments 𝜇2′ =

=

𝑛 𝑖=0 𝑥 𝑖

𝑛

𝜇3 =

;

𝑛 1 𝑖=0(𝑥 𝑖 −𝐴)

𝑛 𝑖=0 𝑥

𝑛

=𝑥−

𝑛 3 𝑖=0(𝑥 𝑖 −𝑥 )

𝑛

𝑛 𝑛 𝑖=0 𝑥 𝑖 − 𝑖=0 𝑥

=

𝑛 𝑛 2 𝑖=0 (𝑥 𝑖 −𝐴) 𝑛

= = = =

=

𝑛

= 𝑥 − 𝑥 = 0, 𝑛 4 𝑖=0 (𝑥 𝑖 −𝑥 )

𝜇4 =

; 𝑛 𝑖=0 𝑥 𝑖

𝑛 𝑛 𝑛 2 𝑖=0 (𝑥 𝑖 −𝑥 +𝑥 −𝐴)

𝑛

𝑛 𝑖=0 𝐴

𝑛

=𝑥−

𝑛𝐴 𝑛

. =𝑥−𝐴 =S

𝑛 𝑛 2 𝑖=0 { 𝑥 𝑖 −𝑥 +(𝑥 −𝐴)}

𝑛 𝑛 2 𝑖=0 (𝑥 𝑖 −𝑥 ) +2 𝑛 2 𝑖=0 (𝑥 𝑖 −𝑥 )

𝑛

𝑛 𝑛 2 𝑖=0 𝑥 𝑖 −𝑥 (𝑥 −𝐴)+ 𝑖=0 (𝑥 −𝐴)

𝑛

+2

=𝜇2 + 0 × 𝑆 + =𝜇2 + 𝑆 2 33 | p a g e

𝑛𝑥

𝑛 𝑖=0 𝑥 𝑖 −𝑥

𝑛 𝑛(𝑆)2 𝑛

.

𝑛(𝑥 −𝐴) 𝑛

[

𝑛 𝑖=0

+

𝑛 2 𝑖=0(𝑥 −𝐴)

𝑛

𝑥𝑖 − 𝑥 = 0]

Department of Finance, Jagannath University

JONY


∴3rd raw moments đ?œ‡3′ =

đ?‘› 3 đ?‘–=0(đ?‘Ľ đ?‘– −đ??´)

đ?‘›

= = =

=

đ?‘› 3 đ?‘–=0(đ?‘Ľ đ?‘– −đ?‘Ľ +đ?‘Ľ −đ??´)

đ?‘›

đ?‘› 3 đ?‘–=0{ đ?‘Ľ đ?‘– −đ?‘Ľ +(đ?‘Ľ −đ??´)}

đ?‘›

đ?‘› 3 đ?‘–=0(đ?‘Ľ đ?‘– −đ?‘Ľ ) +3 đ?‘› 3 đ?‘–=0(đ?‘Ľ đ?‘– −đ?‘Ľ )

đ?‘›

+3

đ?‘› 2 đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘Ľ (đ?‘Ľ −đ??´)+3 đ?‘› 2 đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘Ľ

đ?‘›

=đ?œ‡3 + 3đ?œ‡2 đ?‘† + 3 Ă— 0 Ă— đ?‘† =đ?œ‡3 + 3đ?œ‡2 đ?‘† + (đ?‘†)3 . ∴ 4th raw moments đ?œ‡4′ =

đ?‘› 4 đ?‘–=0(đ?‘Ľ đ?‘– −đ??´)

đ?‘›

= =

đ?‘› 4 đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘Ľ +4

= =

đ?‘› 4 đ?‘–=0(đ?‘Ľ đ?‘– −đ?‘Ľ )

đ?‘›

+4

đ?‘›

đ?‘› đ?‘›(đ?‘Ľ −đ??´)

2

đ?‘›

+

�(�)3 �

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘Ľ

đ?‘›

.

[

đ?‘› đ?‘–=0

.

đ?‘›(đ?‘Ľ −đ??´)2 đ?‘›

+

đ?‘› 3 đ?‘–=0(đ?‘Ľ −đ??´)

đ?‘›

.

đ?‘Ľđ?‘– − đ?‘Ľ = 0]

đ?‘›

đ?‘› 4 đ?‘–=0{ đ?‘Ľ đ?‘– −đ?‘Ľ +(đ?‘Ľ −đ??´)}

đ?‘›

đ?‘› 2 đ?‘Ľ −đ??´ 2 +4 đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘Ľ

đ?‘›

.

+3

đ?‘› 4 đ?‘–=0(đ?‘Ľ đ?‘– −đ?‘Ľ +đ?‘Ľ −đ??´)

đ?‘› 3 đ?‘Ľ −đ??´ +6 đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘Ľ đ?‘› 3 đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘Ľ

.

đ?‘› đ?‘› 2 3 đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘Ľ (đ?‘Ľ −đ??´) + đ?‘–=0(đ?‘Ľ −đ??´)

đ?‘›(đ?‘Ľ −đ??´)

+6

� �(�)4

đ?‘› 2 đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘Ľ

đ?‘›

.

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘Ľ

đ?‘›(đ?‘Ľ −đ??´)2 đ?‘›

đ?‘Ľ −đ??´ 3 + đ?‘›đ?‘–=0 đ?‘Ľ −đ??´ 4

+4

đ?‘› đ?‘–=0 đ?‘Ľ đ?‘– −đ?‘Ľ

đ?‘› đ?‘› đ?‘–=0

.

.

đ?‘›(đ?‘Ľ −đ??´)3 đ?‘›

+

đ?‘› 4 đ?‘–=0 (đ?‘Ľ −đ??´)

đ?‘›

.

=đ?œ‡4 + 4đ?œ‡3 đ?‘† + 6đ?œ‡2 đ?‘† 2 + 0 + [ đ?‘Ľđ?‘– − đ?‘Ľ = 0] đ?‘› 2 4 =đ?œ‡4 + 4đ?œ‡3 đ?‘† + 6đ?œ‡2 đ?‘† + đ?‘† . Note: The formula is used for this moment: (đ?‘Ž + đ?‘?)4 = đ?‘Ž4 + 4đ?‘Ž3 đ?‘? + 6đ?‘Ž2 đ?‘?2 + 4đ?‘Žđ?‘?3 + đ?‘?4

Md. Mazharul Islam (Jony). Roll no: 091541, 3rd Batch. Department of Finance. Jagannath University.

Skewness Skewness: Skewness is the degree of asymmetry or departure from symmetry, of a distribution. If the frequency-curve (smoothed frequency polygon) of a distribution has a longer tail to the right of central maximum than to the left, the distribution is said to be skewed to the right or to have positive skewness. If the reverse is true, it is said to be skewed to the left or to have negative skewness. Positive skewed distribution: In the positive skewed distribution, the value of the mean is maximum and that of mode least the median lies in between the two as is clear from the following diagram. Negative skewed distribution: In a negative skewed distribution the value of mode is maximum and that of mean least the median lies in between the two. 34 | p a g e

Department of Finance, Jagannath University

JONY


Symmetrical distribution: In a symmetrical distribution the values of mean, median and mode coincide. Asymmetrical distribution: A distribution which is not symmetrical is called a skewed distribution and such a distribution could either be skewed or negatively skewed.

Symmetrical đ??´đ?‘š = đ?‘€đ?‘’ = đ?‘€đ?‘œ

Negatively Skewed đ??´đ?‘€ < đ?‘€đ?‘’ < đ?‘€đ?‘œ

Positively Skewed đ??´đ?‘€ > đ?‘€đ?‘’ > đ?‘€đ?‘œ

Tests of skewness: In order to ascertain whether a distribution is skewed or not, the following tests may be applied. Skewness is present if: ďƒ˜ The values of mean, median and mode do not coincide. ďƒ˜ When the data are plotted on a graph they do not give normal bell-shaped form. (i.e. When cut along a vertical line through the center the two halves are not equal.) ďƒ˜ The sum of the positive deviations from the median is not equal to the sum of the negative deviations. ďƒ˜ Quartiles are not equidistant from the median. ďƒ˜ Frequencies are not equally distributed at points of equal deviation from the mode. Conversely stated, when skewness is absent. (i.e. in case of a symmetrical distribution, the following condition are satisfied: ďƒ˜ The values of mean, median and mode coincide. ďƒ˜ When the data are plotted on a graph they give normal bell-shaped form. (i.e. When cut along a vertical line through the center the two halves are equal.) ďƒ˜ The sum of the positive deviations from the median is equal to the sum of the negative deviations. ďƒ˜ Quartiles are equidistant from the median. ďƒ˜ Frequencies are equally distributed at points of equal deviation from the mode. Pearsonian coefficient of skewness: To get the pearsonian coefficient of skewness, we divide the difference between the mean and the mode by standard deviation. Thus the formula for the pearsonian coefficient of skewness is, Sk =

đ?‘šđ?‘’đ?‘Žđ?‘› −đ?‘šđ?‘œđ?‘‘đ?‘’ đ?‘†đ?‘Ąđ?‘Žđ?‘›đ?‘‘đ?‘Žđ?‘&#x;đ?‘‘ đ??ˇđ?‘’đ?‘Łđ?‘–đ?‘Žđ?‘Ąđ?‘–đ?‘œđ?‘›

=

đ?‘Ľ −đ?‘€đ?‘œ đ?‘†đ??ˇ

.

Here, ďƒ˜ If Mean > Mode, the distribution is said to be positively skewed. 35 | p a g e

Department of Finance, Jagannath University

JONY


ďƒ˜ If Mean < Mode, the distribution is said to be negatively skewed. ďƒ˜ If Mean = Mode, the distribution is said to be symmetrical. To avoid using the mode, we can employ the empirical formula, Sk =

3(đ?‘šđ?‘’đ?‘Žđ?‘› −đ?‘šđ?‘’đ?‘‘đ?‘–đ?‘Žđ?‘› ) đ?‘†đ?‘Ąđ?‘Žđ?‘›đ?‘‘đ?‘Žđ?‘&#x;đ?‘‘ đ??ˇđ?‘’đ?‘Łđ?‘–đ?‘Žđ?‘Ąđ?‘–đ?‘œđ?‘›

=

3(đ?‘Ľ −đ?‘€đ?‘’ ) đ?‘†đ??ˇ

.

Coefficients of skewness based on moments are Beta coefficient and Gamma coefficient. Their relation with each is given below-

đ?›˝1 =

đ?œ‡32 đ?œ‡2

3

; is always positive.

or

đ?›˝1 =

đ?œ‡3 3 đ?œ‡ 22

=

đ?œ‡3 đ?œŽ3

; Can be positive or negative

[Where đ?œ‡3 đ?‘Žđ?‘›đ?‘‘ đ?œ‡2 are the 3rd and 2nd central / corrected moments] Also we can say that, đ?›ž1 = đ?›˝1 Here, ďƒ˜ If β1 = 0 or Îź3 = 0 then the distribution is symmetrical, where mean median and mode coincide. ďƒ˜ If β1 < 0 or Îź3 < 0 then the curve is negatively skewed which means that the curve has an elongated left tail. ďƒ˜ If β1 > 0 or Îź3 > 0 then the curve is positively skewed which indicates that the right tail of a frequency curve is longer than the left tail. Note: Karl Pearson introduced the concept of Beta coefficients which are known as pearsonian coefficients and Gamma notations were introduced by R.A. Fisher. Example: The arithmetic mean and mode of a distribution are 120 and 123. If SD = 10, find Pearson’s coefficient of skewness and hence comment. Solution: Pearsonâ€&#x;s coefficient of skewness, Sk =

đ?‘šđ?‘’đ?‘Žđ?‘› −đ?‘šđ?‘œđ?‘‘đ?‘’ đ?‘†đ?‘Ąđ?‘Žđ?‘›đ?‘‘đ?‘Žđ?‘&#x;đ?‘‘ đ??ˇđ?‘’đ?‘Łđ?‘–đ?‘Žđ?‘Ąđ?‘–đ?‘œđ?‘›

=

120−123 10

=

−3 10

= −0.3

Comment: Since Sk < 0, the distribution is negative skewed.

Kurtosis Kurtosis: Kurtosis is the degree of peakness or flatness of a distribution usually taken relative to a normal distribution. Leptokurtic: If a curve is more peaked than the normal curve, it is called leptokurtic. Platykurtic: If a curve is more flat-topped than the normal curve, it is called platykurtic. Mesokurtic: The normal curve itself is known as mesokurtic. The condition of peakness or flatness itself is known as kurtosis of excess. The concept of kurtosis is rarely used in elementary statistical analysis. The following diagram illustrates the shape of three different curves mentioned below:

36 | p a g e

Department of Finance, Jagannath University

JONY


Mesokurtic đ?›˝2 = 3

Platykurtic đ?›˝2 < 3

Leptokurtic đ?›˝2 > 3

Measures of Kurtosis: The most important Kurtosis is based on 2nd and 4th central moments is defined by đ?›˝2 and is denoted as, đ?œ‡ đ?›˝2 = 42 . đ?œ‡2

Here, ďƒ˜ If đ?›˝2 = 3, then the distribution is mesokurtic. ďƒ˜ If đ?›˝2 > 3, then the distribution is leptokurtic. ďƒ˜ If đ?›˝2 < 3, then the distribution is platykurtic. Sometimes đ?›ž2 , the derivative of đ?›˝2 , is used as a measure of kurtosis. đ?›ž2 đ?‘–đ?‘ đ?‘‘đ?‘’đ?‘“đ?‘–đ?‘›đ?‘’đ?‘‘ đ?‘Žđ?‘ : đ?›ž2 = đ?›˝2 − 3 Then, ďƒ˜ If đ?›ž2 = 0, then the distribution is mesokurtic. ďƒ˜ If đ?›ž2 > 0, then the distribution is leptokurtic. ďƒ˜ If đ?›ž2 < 0, then the distribution is platykurtic. Example1: For a distribution we have đ?œ‡2 = 2.5, đ?œ‡3 = 0.7, đ?œ‡4 = 18.7 . Find đ?›˝1 and đ?›˝2 and hence comment on the shape of the distribution. Solution:

đ?›˝1 =

đ?œ‡ 32

(0.7)2

đ?œ‡2

(2.5)3

= 3

=

0.49 15.625

= 0.03136

And đ?›˝2 =

đ?œ‡4

18.7

18.7

đ?œ‡2

(2.5)

6.25

= 2

= 2

= 2.99 ≈ 3

Comment: Since đ?œ‡3 = 0.7 and đ?›˝2 = 3. So the distribution is slightly positively skewed and kurtosis is mesokurtic. Example2: For a distribution we have đ?œ‡2 = 3, đ?œ‡3 = −2, đ?œ‡4 = 30 . Find đ?›˝1 and đ?›˝2 and hence comment on the shape of the distribution. Solution:

đ?›˝1 =

đ?œ‡ 32

(−2)2

đ?œ‡2

(3)3

= 3

=

4 27

= 0.15

And đ?›˝2 =

đ?œ‡4 đ?œ‡2

2

=

30 (3)2

=

30 9

= 3.33

Comment: Since đ?›˝1 = 0.15 that is đ?›˝1 > 0 and đ?›˝2 > 3 . So the distribution is slightly positively skewed and kurtosis is leptokurtic. Example3: For a distribution we have đ?œ‡2 = 1.5, đ?œ‡3 = 0, đ?œ‡4 = 6 . Find đ?›˝1 and đ?›˝2 and hence comment on the shape of the distribution. Solution:

đ?›˝1 =

đ?œ‡ 32

(0)2

đ?œ‡2

(1.5)3

= 3

37 | p a g e

=

0 2.25

=0

And đ?›˝2 =

đ?œ‡4 đ?œ‡2

2

=

6 (1.5)2

=

6 2.25

= 2.67

Department of Finance, Jagannath University

JONY


Comment: Since đ?œ‡3 = 0 or đ?›˝1 = 0 and đ?›˝2 < 3 . So the distribution is symmetrical and kurtosis is platykurtic. Example 4: The first 4 moments of a distribution about the value to 5 of the variable are 2, 20, 40 and 50. Find mean, variance,đ?›˝1 , đ?›˝2 , and hence comment on the shape of the distribution. Solution: Here, đ?œ‡1′ = 2, đ?œ‡2′ = 20, đ?œ‡3′ = 40, đ?œ‡4′ = 50 and A = 5 We have, đ?œ‡1′ = đ?‘Ľ − đ??´ đ?‘Ľ = đ?œ‡1′ + đ??´ = 2+5 = 7 ∴ Mean = 7 ∴ Variance đ?œ‡2 = đ?œ‡2′ − đ?œ‡1′ 2 = 20 – (2)2 = 20 – 4 = 1 ∴ đ?œ‡3 = đ?œ‡3′ − 3 đ?œ‡2′ đ?œ‡1′ + 2(đ?œ‡1′ )3 = 40 – 3Ă—20Ă—2 + 2Ă— (2)3 = 40 – 120+ 16 = −64 ∴ đ?œ‡4 = đ?œ‡4′ − 4 đ?œ‡3′ đ?œ‡1′ + 6đ?œ‡2′ (đ?œ‡1′ )2 − 3(đ?œ‡1′ )4 = 50 – 4Ă—40Ă—2 +6Ă—20Ă— (2)2 −3Ă— (2)4 = 50–320 + 480 –48 = 162

đ?›˝1 =

đ?œ‡ 32

(−64)2

đ?œ‡2

(16)3 4096 đ?œ‡4 162 162

= 3

And đ?›˝2 =

đ?œ‡ 22

=

=

4096

(16)2

=

=0

256

= 0.63

Comment: Since đ?œ‡3 < 0 and đ?›˝2 < 3 . So the distribution is negatively skewed and kurtosis is platykurtic. Example5: We have the following information obtained from 150 children in a community. Height(inch) Weight(kg) 40 10 Mean ( đ?‘Ľ ) SD 5 2 Which characteristics height or weight is more consistent (less variability) Solution: C.V. for height = C.V. for weight =

5 40 2 10

Ă— 100 = 12.5% Ă— 100 = 20%

Height is more consistent or less variability.

Correlation Correlation: Correlation analysis is the statistical tool; we can use to describe the degree to which one variable is linearly related to another. Correlation means the linear relationship between two or more variables. Thus correlation analyzing refers to the techniques used in measuring the closeness of the relationship between variables. If the change in one effects a change in the other variable, they are said correlation.

38 | p a g e

Department of Finance, Jagannath University

JONY


Whenever two variables are so related that a change in the value of one is accompanied by a change in the value of the other, in such a way, that, a) An increase in the one is accompanied by an increase or decrease in the other. Or b) A decrease in the one is accompanied by a decrease or increase in the other. Then the variables are said to be correlated. Methods of studying correlation: The following are the important methods of ascertaining whether two variables are correlated or not. 1. Scatter diagram method. 2. Karl Pearson‟s coefficient of correlation. 3. Spearman‟s Rank coefficient of correlation. Limits of the coefficient of correlation: In Pearson‟s formula, no correlation means r = 0 and perfect correlation means r =1. Perfect correlation can be positive or negative. Thus perfect positive correlation is represented by (+)1 and perfect negative correlation is represented by (−)1. Here the coefficient of correlation „r‟ is an abstract number or a pure number which measures the degree of the relationship between two variables. Linear correlation: If x and y denote the two variables under consideration a scatter diagram shows the location of points (x,y) on a rectangular coordinate system. If all points in this scatter diagram seem to lie near the correlation is called linear.  If y tends to increase as x increases, the correlation is called positive or direct correlation.  If y tends to decrease as x increase, the correlation is called negative or inverse correlation.  If all points seem to lie near some curve, the correlation is called nonlinear correlation. It is clear that nonlinear correlation can be sometimes positive and sometimes negative.  If there is no relationship indicated between the variables, we can say that there is no correlation between them,(i.e. they are uncorrelated)

Positive linear correlelation

39 | p a g e

Negative linear coccrlation

No correlation

Department of Finance, Jagannath University

JONY


Non- linear Correlation: If the amount of change in one variable with the corresponding change of the other is not changed by any constant ratio, then we can say the correlation non-linear or curvilinear. Example: If we increase the study- hours of students twice then the result would never progress by the same rate.

Scatter Diagram/ Dot Diagram: Scatter diagram, the first step in correlation analysis is to visualize the relationship. The term scatter refers to the dispersion of the dots on the graph. The simplest device for ascertaining whether two variables are related is to prepare a dot chart called scatter diagram. It is the absolute way of the diagrammatic presentation of bivariate data. Thus for the bivariate distribution (�� , �� ); � = 1, 2, 3, ‌ ‌ . �. If the values of the variables � ��� � be plotted along the �-axis and �axis respectively in the �� plane, the diagrams of dots obtained is known as scatter diagram (dot diagram)

Perfect Positive correlation

High degree of Positive correlation

Perfect Negative correlation

High degree of Negative correlation

No correlation

Low degree of Positive correlation

Low degree of Negative correlation

From the scatter diagram, we can form a fairly good idea whether the variables are correlated or not. If the points are very dense or very close then we can say that strong or high correlation 40 | p a g e

Department of Finance, Jagannath University

JONY


between the variables. And if the points are cluster around a straight line then we can say that weak or low correlation between two variables.

Merits and limitation of the Scatter diagram: Merits: ďƒ˜ This method is very easy and simple as it is non-mathematical method of studying correlation between the variables. To some extent, the degree of correlation may also be guessed from it. ďƒ˜ It is not influenced by the size of extreme items. But we shall see later that most of the mathematical methods of finding correlation lack this quality. ďƒ˜ Drawing a scatter diagram usually is the first step in investigating the relationship between two variables. Limitation: ďƒ˜ The scatter diagram only shows the type of correlation between the two variables. To some extent the degree of correlation may also be guessed from it. But the exact degree of correlation cannot be obtained from it. Karl Pearson’s coefficient of correlation: Of the several mathematical methods of measuring correlation, the Karl Pearsonâ€&#x;s method, popularly known as Pearsonian coefficient of correlation, is widely used in practice. The coefficient of correlation is denoted by the symbol „r’ . If the two variables are đ?‘Ľ đ?‘Žđ?‘›đ?‘‘ đ?‘Ś, the following formula suggested by Pearson can be used for measuring the degree of relation. r=

đ?‘Ľâˆ’đ?‘Ľ (đ?‘Śâˆ’đ?‘Ś ) đ?‘Ľâˆ’đ?‘Ľ 2 (đ?‘Śâˆ’đ?‘Ś )2

đ?‘Ľđ?‘Ś −

= đ?‘Ľ2−

đ?‘Ľ 2 đ?‘›

đ?‘Ľ đ?‘Ś đ?‘›

đ?‘Ś 2−

đ?‘Ś 2 đ?‘›

.

đ??ˆđ??§đ??­đ??žđ??Ťđ??Šđ??Ťđ??žđ??­đ??˘đ??§đ?? đ??¨đ??&#x; đ??œđ??¨đ??žđ??&#x;đ??&#x;đ??˘đ??œđ??˘đ??žđ??§đ??­ đ??¨đ??&#x; đ??œđ??¨đ??Ťđ??Ťđ??žđ??Ľđ??šđ??­đ??˘đ??¨đ??§: The following general guideline are given which would help in interpreting the value „râ€&#x;. Hence it is clear that the coefficient of correlation can never be greater than (+) 1 and less than (−)1. ďƒ˜ When r = +1: It means that there exists a perfect positive correlation between two variables. ďƒ˜ When r = −1: It means that there exists a perfect negative correlation between two variables. ďƒ˜ When r = 0: It means that two variables are independent. The correlation between them is Zero. But its converse not true. (i.e. if the correlation between two variables is zero, they are not necessarily independent. Zero coefficient of correlation shows the absence of linear relationship between the two variables.) So the values between 1 and −1 are interpreted accordingly. Note: In practice, such value ‘r’ as +1, −1 and 0 are rare. We normally get values which lies between +1 and −1 such as +0.8, −0.4 etc. The coefficient of correlation describes not only the magnitude of correlation but also its direction. Thus +0.8 would mean that correlation is positive because the sign of ‘r’ is positive and the magnitude of correlation is 0.8.

41 | p a g e

Department of Finance, Jagannath University

JONY


Properties of the coefficient of correlation: The followings are the important properties of the coefficient of correlation, r: 1. The coefficient of correlation lies between −1and +1. Symbolically: −1 ≤ đ?‘&#x; ≤ 1 or đ?‘&#x; ≤ 1 2. The coefficient of correlation is independent of change of origin and scale. 3. It gives the degree of concomitant movement or variation between two variables. Symbolically: đ?‘&#x;đ?‘Ľđ?‘Ś = đ?‘&#x;đ?‘Śđ?‘Ľ 4. The coefficient of correlation is the geometric mean of two regression coefficients. Symbolically: đ?‘&#x; = đ?‘?đ?‘Ľđ?‘Ś Ă— đ?‘?đ?‘Śđ?‘Ľ 5. If đ?‘Ľ đ?‘Žđ?‘›đ?‘‘ đ?‘Ś are independent variables then coefficient of correlation is zero. However the converse is not true. Limitation: 1. To determine the coefficient of correlation ′đ?‘&#x;′ we have to assume that there is a linear relationship or not non-linear relationship. 2. It is valid when we have a random sample from a bivariate normal distribution. 3. If the sample size is small then it does not give us a better result to determine the relation. Comments on coefficient of correlation (r): The following chart shows approximately, Values of r Comments r = +1 Perfect Positive correlation Perfect Negative correlation r = −1 Higher degree of Positive correlation r> 0.8 Higher degree of Negative correlation r< −0.8 Moderate degree of Positive correlation 0.2 < đ?‘&#x; < 0.8 Moderate degree of Negative correlation −.08 < đ?‘&#x; < −0.2 Low degree of Positive correlation 0 < đ?‘&#x; < 0.2 Low degree of Negative correlation −0.2 < đ?‘&#x; < 0 r=0 No correlation The following drawing summarizes the strength and direction of the coefficient of correlation. Perfect Negative correlation

Moderate positive correlation

Moderate negative correlation

Weak Weak negative positive correlation correlation

Strong negative correlation −1.00

−0.50 Negative correlation

42 | p a g e

Perfect Positive correlation

No correlation

0

Strong positive correlation 0.05

1.00

Positive correlation

Department of Finance, Jagannath University

JONY


Proof: −1 ≤ đ?‘&#x; ≤ 1 If the coefficient if correlation is r, then r= Let,

a=

đ?‘Ľâˆ’đ?‘Ľ (đ?‘Śâˆ’đ?‘Ś ) đ?‘Ľâˆ’đ?‘Ľ 2

đ?‘Ľâˆ’đ?‘Ľ đ?‘Ľâˆ’đ?‘Ľ 2

, b=

(đ?‘Ś −đ?‘Ś )2 (đ?‘Śâˆ’đ?‘Ś ) (đ?‘Ś −đ?‘Ś )2

(đ?‘Ž + đ?‘?)2 = đ?‘Ž2 + 2 đ?‘Žđ?‘? + đ?‘?2 = 1 + 2r +1 = 2(1+r) ≼ 0 or 1+r≼ 0 ‌‌(i) 2 Similarly, (đ?‘Ž − đ?‘?) = đ?‘Ž2 − 2 đ?‘Žđ?‘? + đ?‘?2 = 1 − 2r +1 = 2(1−r) ≼ 0 or 1−r≼ 0 ‌‌(ii) From (i) and (ii), we can say that, −1 ≤ đ?‘&#x; ≤ 1 (proven) Then,

Coefficient of Determination: The coefficient of determination is the square term of coefficient of correlation ′đ?‘&#x;′. It is usually denoted by ′đ?‘&#x; 2 ′. It is expressed the proportion of the total variation of the dependent variable has been explained by the independent variable. Example: If đ?‘&#x; = 0.8 đ?‘Ąđ?‘•đ?‘’đ?‘› đ?‘&#x; 2 = 0.64, which indicates that 64% of the total variation in the dependent variable has been explained by the independent variable. It is an easier and more useful measure than the coefficient of correlation ′đ?‘&#x;′ Example: Following figures give the rainfall in inches for the year and the production in 00’s of kgs. For the Rabi crop and Kharif crop. Calculate the Karl Pearson’s coefficient of correlation between rainfall and total production. Rainfall 20 22 24 26 28 30 32 Rabi Production 15 18 20 32 40 39 40 Kharif Production 15 17 20 18 20 21 15

Solution: Let rainfall be denoted by đ?‘Ľ 20 22 24 26 28 30 32 đ?‘Ľ = 182

43 | p a g e

′�′ and production by ′�′. � �� 30 600 35 770 40 960 50 1300 60 1680 60 1800 55 1760 � =330 ��=8870

đ?‘Ľ2 400 484 576 676 784 900 1024 đ?‘Ľ 2 =4844

đ?‘Ś2 900 1225 1600 2500 3600 3600 3025 2 đ?‘Ś = 16450

Department of Finance, Jagannath University

JONY


đ?‘Ľđ?‘Ś −

∴ r= đ?‘Ľ2−

đ?‘Ľ đ?‘Ś đ?‘›

đ?‘Ľ 2 đ?‘›

đ?‘Ś 2−

8870 −

=

đ?‘Ś 2 đ?‘›

182Ă—330 7

(182)2 7

(330)2 7

4844−

=

8870−8580 112 892.86

=

16450− 290

10.58Ă—29.88

=

290 316.1304

= 0.917

Comment: It is a case of very high degree of positive correlation between Rainfall and Agricultural production. Example: Here is given the following data on x and y. 3 5 6 8 10 11 đ?‘Ľ 5 6 5 9 12 10 đ?‘Ś Calculate the coefficient of correlation. Solution: đ?‘Ľ đ?‘Ś đ?‘Ľđ?‘Ś đ?‘Ľ2 đ?‘Ś2 3 5 15 9 25 5 6 30 25 36 6 5 30 36 25 8 9 72 64 81 10 12 120 100 144 11 10 110 121 100 2 đ?‘Ľ =355 đ?‘Ś 2 =411 đ?‘Ľ = 43 đ?‘Ś = 47 đ?‘Ľđ?‘Ś = 377 đ?‘Ľđ?‘Ś −

∴ r= đ?‘Ľ2−

đ?‘Ľ 2 đ?‘›

đ?‘Ľ đ?‘Ś đ?‘›

đ?‘Ś 2−

đ?‘Ś 2 đ?‘›

=

377 −

43Ă—47 6

(43)2 6

355−

= =

(47)2 6

411−

377−336.83 355−308.17 411−368.17 40.17 40.17 6.84Ă—6.54

=

44.7336

=

40.17 46.83 42.83

=

= 0.89

Interpretation: There exists a strong positive linear relation between x and y.

Regression Regression Analysis: Regression analysis is concerned with the study of dependence of one variable (dependent variable) on one or more other variable (independent variables) for estimating the average value of the dependent variable in terms of the known values of the dependent variable. Simple Regression Analysis: The term simple regression analysis indicates that the value of a dependent variable is estimated on the basis of one independent variable.

44 | p a g e

Department of Finance, Jagannath University

JONY


Properties of the Regression Coefficient: 1. The coefficient of correlation is the geometric mean of the two regression coefficients symbolically: đ?‘&#x; = đ?‘?đ?‘Ľđ?‘Ś Ă— đ?‘?đ?‘Śđ?‘Ľ Proof: we know that, đ?‘?đ?‘Ľđ?‘Ś = đ?‘&#x;

đ?œŽđ?‘Ľ đ?œŽđ?‘Ś

; đ?‘?đ?‘Śđ?‘Ľ = đ?‘&#x;

∴ đ?‘?đ?‘Ľđ?‘Ś Ă— đ?‘?đ?‘Śđ?‘Ľ = đ?‘&#x; ∴

đ?œŽđ?‘Ľ đ?œŽđ?‘Ś

Ă— đ?‘&#x;

đ?œŽđ?‘Ś đ?œŽđ?‘Ľ

đ?œŽđ?‘Ś đ?œŽđ?‘Ľ

;

= đ?‘&#x;2

đ?‘?đ?‘Ľđ?‘Ś Ă— đ?‘?đ?‘Śđ?‘Ľ = đ?‘&#x;

Regression Equation: Statisticians have derived two equations; we can use to find the slope and Y-intercept of the best fitting regression line. The first formula calculates the slope. Slope of the best fitting regression line: đ?‘?=

đ?‘Ľ đ?‘Ś đ?‘› đ?‘Ľ 2 2 đ?‘Ľ − đ?‘›

đ?‘Ľđ?‘Ś −

Here, b = Slope of the best fitting estimating line. đ?‘Ľ = Values of the independent variable. đ?‘Ś = Values of the dependent variable. n = Number of data points (that is, the number of pairs of values for the independent and dependent variables). The second formula calculates the Y-intercept of the line whose slope we calculated using equation. Y-intercept of the best fitting regression line: a = đ?‘Ś − đ?‘?đ?‘Ľ or

đ?‘Ś đ?‘›

− đ?‘?

đ?‘Ľ đ?‘›

Here, a = Y-intercept. b = Slope from equation. đ?‘Ś = Mean of the values of the dependent variable. đ?‘Ľ = Mean of the values of the independent variable. With this two equations, we can find the best fitting regression line for any two variable set of data point. Here a and b are called least squares estimates. The equation for a fitted or straight regression line where the dependent variable y is determine by the independent variable đ?‘Ľ is, đ?‘Ś = đ?‘Ž + đ?‘?đ?‘Ľ Here, đ?‘Ś = Dependent variable. a = y-intercept. b = slope of the line. đ?‘Ľ = Independent variable.

45 | p a g e

Department of Finance, Jagannath University

JONY


Using this equation we can take a given value of x and compute the value of y. The a is called the y-intercept because its value is the point at which the regression line crosses the y axis that is the vertical axis. The b in equation is the slope of the line. It represents how much each unit change of the independent variable x changes the dependent variable y. Both a and b are numerical constants because for any given straight line, their values do not change. Note: ďƒ˜ If x depends on y then y is independent and x is dependent. And ďƒ˜ If y depends on y then x is independent and y is dependent. Example: If it crossed the y-axis at 3. Therefore we know a = 6If a =3 and b = 2 then what y would be for an x = 5. Solution: We know that, đ?‘Ś = đ?‘Ž + đ?‘?đ?‘Ľ = 3 + 2(5) = 3 + 10 = 13 [Value for y given x = 5] Example: If it crossed the y-axis at 3. Therefore we know a = 6. If we select the two point where đ?‘Ľ1 , đ?‘Ś1 = (1,5) and đ?‘Ľ2 , đ?‘Ś2 = (2,7) . Find slope. Solution:

a=3

14 12 10 8 6 4 2 0

The point (0,3) 0

1

The point (5,13)

2nd point 1st point (2,7) (1,5)

2

3

4

5

∴ The slope of a straight line: đ?‘Ś2 − đ?‘Ś1 7−5 2 = = =2 đ?‘Ľ2 − đ?‘Ľ1 2−1 1 Thus the relationship between the variables is direct and the slope is positive. Now with the numerical values of a and b determined, we can substitute in the general equation for a straight line, đ?‘Ś = đ?‘Ž + đ?‘?đ?‘Ľ = 3 + 2đ?‘Ľ Assume that we wish to find the value of dependent variable that corresponds to x = 5 substituting into equation. đ?‘Ś = 3 + 2(5) = 3 + 2 Ă— 5 = 13 Thus when x = 5, y must equal to 13, If we refer to the line in graph we can see that point (5, 13) does on the line. đ?‘?=

46 | p a g e

Department of Finance, Jagannath University

JONY


Example: A departmental store has the following statistics of sales for a period of last one year of 10 salesman, who have varying of experience. Years of Experience: 1 3 4 4 6 8 10 10 11 13 Annual Sales: 80 97 92 102 103 111 119 123 117 136 ďƒ˜ Fit a regression line of annual sales on years of experience. ďƒ˜ Interpret the estimated coefficient (a or y- intercept and b). ďƒ˜ Predict the annual sales volume of person’s who have 12 and 15 years of experience. Solution: We know, the least squares estimates are, b=

đ?‘Ľ đ?‘Ś đ?‘› đ?‘Ľ 2 2 đ?‘Ľ − đ?‘›

đ?‘Ľđ?‘Ś −

and

y- intercept or a =

đ?‘Ś đ?‘›

− đ?‘?

đ?‘Ľ đ?‘›

Calculation: Experience(x) 1 3 4 4 6 8 10 10 11 13 đ?‘Ľ = 70 ∴b= ∴a=

8128−

70Ă—1080 10 (70)2

632− 1080 10

=

10

− 4Ă—

70 10

8128−7560 632−490

=

Sales(y) 80 97 92 102 103 111 119 123 117 136 đ?‘Ś = 1080 568 142

đ?‘Ľ2 1 9 16 16 36 64 100 100 121 169 đ?‘Ľ 2 = 632

đ?‘Ľđ?‘Ś 80 291 368 408 618 888 1190 1230 1287 1768 đ?‘Ľđ?‘Ś = 8128

=4

= 108 − 28 = 80

Therefore, the fitted regression line of y on x is: đ?‘Ś = đ?‘Ž + đ?‘?đ?‘Ľ = 80 + 4đ?‘Ľ

Interpretation of the estimated coefficient (a and b): Here, a = 80, means if the store employs a person without any experience (i.e. đ?‘Ľ = 0), then the average sales value will be Tk. 80 thousand. Also b = 4, means that for average increase of one year sales experience of a person, the sales volume would increase on the average by Tk. 4 thousand. Prediction: đ?‘Ś = 80 + 4đ?‘Ľ ∴The annual sales volume for 12 years of experience = 80 + 4 Ă— 12 = 128. ∴The annual sales volume for 12 years of experience = 80 + 4 Ă— 15 = 150. 47 | p a g e

Department of Finance, Jagannath University

JONY


Example: If it crosses the Y axis at 6. Therefore we know a = 6. If we select the two point where đ?‘Ľ1 , đ?‘Ś1 = (0,6) and đ?‘Ľ2 , đ?‘Ś2 = (1,3) . Find slope. Solution: 8 1st Point (0,6)

6

a=6

2nd point The (1,3)point

4 2

(2,0)

0

∴ The slope of a straight line:

0

1

2

3

4

5

6

đ?‘Ś2 − đ?‘Ś1 3 − 6 −3 = = = −3 đ?‘Ľ2 − đ?‘Ľ1 1−0 1 When b is negative, the line represent an inverse relationship and the slope is negative ( y decrease as x increase). Now with the numerical values of a and b determined, we can substitute in the general equation for a straight line, đ?‘Ś = đ?‘Ž + đ?‘?đ?‘Ľ = 6 + −3 đ?‘Ľ = 6 − 3đ?‘Ľ Assume that we wish to find the value of dependent variable that corresponds to x = 2 substituting into equation. đ?‘Ś = 6 + −3 (2) =6−6=0 Thus when x = 2, y must equal to 0, If we refer to the line in graph we can see that point (2, 0) does on the line. đ?‘?=

Md. Mazharul Islam (Jony). Roll no: 091541, 3rd Batch. Department of Finance. Jagannath University.

Probability Probability Probability or a tendency „uncertaintyâ€&#x; or „chanceâ€&#x; refers to the probable movements or to occurring an event. In other words, A numerical measure of certainty or uncertainty of event of en experiment is called probability. Its value lies between zero (0) and one (1), inclusive, describing the relative possibility or chance an event will occur. There are various approaches to define probability. The principal approaches are as follows:

48 | p a g e

Department of Finance, Jagannath University

JONY


A. Classical or Mathematical or Priori Approach: The classical approach to probability is based on the assumption that each outcome is equally likely and exhaustive. Because this approach permits determination of probability values before any sample events are observed, it has also been called priori approach. The probability of an event, đ?‘ đ?‘œ .đ?‘œđ?‘“ đ?‘œđ?‘?đ?‘ đ?‘’đ?‘&#x;đ?‘Łđ?‘Žđ?‘Ąđ?‘–đ?‘œđ?‘› đ?‘œđ?‘“ đ??´

P(A) =

đ?‘‡đ?‘œđ?‘Ąđ?‘Žđ?‘™ đ?‘›đ?‘˘đ?‘šđ?‘?đ?‘’đ?‘&#x; đ?‘œđ?‘“ đ?‘œđ?‘˘đ?‘Ąđ?‘?đ?‘œđ?‘šđ?‘’đ?‘

=

đ?‘ (đ??´) đ?‘ (đ?‘†)

.

B. Relative Frequency Approach: The relative frequency approach, the probability is determined on the basis of the proportion of times that is a favorable occurs in a number of observations or experiments. No prior assumption of equal likelihood is involved. Because determination of the probability values is based on observation and collection of data, this approach has also been called the empirical approach. The probability that event A will occur by the relative frequency approach is, P(A) =

đ?‘ đ?‘œ.đ?‘œđ?‘“ đ?‘œđ?‘?đ?‘ đ?‘’đ?‘&#x;đ?‘Łđ?‘Žđ?‘Ąđ?‘–đ?‘œđ?‘› đ?‘œđ?‘“ đ??´ đ?‘†đ?‘Žđ?‘šđ?‘?đ?‘™đ?‘’ đ?‘ đ?‘–đ?‘§đ?‘’

=

đ?‘ (đ??´) đ?‘

.

Experiment: Experiment is an act that can be repeated under given conditions; usually the exact result of the experiment canâ€&#x;t be predicted with certainty.

Unit experiment: Unit experiment is known as trial. This means that trial is a special case of experiment. Experiment may be a trial or two or more trials.

Outcome: The results of an experiment are known as outcome. Example: Throwing a die is a trial (unit experiment) and 1, 2, 3, 4, 5, 6 are possible outcome.

Random experiment: A random experiment is an experiment that can be repeated any number of times under some identical conditions. In any random experiment the outcome of any particular trial should not be known beforehand. But all possible outcomes should be known in advance. Example: I. Tossing a fair coin or throwing a die and observe what the top shows. II. The number of road accidents per day in Dhaka City.

Events: An event is a possible outcome of an experiment or a result of a trial or an observation . Events are generally denoted by A, B, C etc.

Simple event: An elementary event or a simple event is a single possible outcome of an experiment. It is thus an event which cannot be further subdivided into a combination of other events. Example: In case of rolling a die, to have two dots is a simple event. 49 | p a g e

Department of Finance, Jagannath University

JONY


Compound events: When two or more events occur in connection with each other, then their simultaneous occurrence is called a compound event. The compound event is an aggregate of simple events. Example: To die for every living is a certain event.

Impossible events: An event whose occurrence is quietly impossible in a random experiment is called an impossible event. The probability of impossible event is Zero. Example: To live without breathing is an impossible event.

Mutually exhaustive events: If the happening of any of the events excludes the happening of all the other then the events would be termed as mutually exclusive events or disjoint. Two events A and B are said to be mutually exhaustive if they cannot happen or occur together. That is đ??´ ∊ đ??ľ ≠0. Example: If a single coin is tossed either head can be up or tail can be up, both cannot be up at the same time. A

B

Venn diagram of disjoint sets

Non-Exclusive events: Two or more events are non-exclusive when it is possible for them to occur together.

Collectively exhaustive events: Collective exhaustive events are those which include all possible outcomes. Example: The number of all possible outcomes, in case of throwing a die, 1, 2, 3, 4, 5 and 6 create a collective exhaustive event. If đ??´ âˆŞ đ??ľ = đ?‘† (đ?‘ đ?‘Žđ?‘šđ?‘?đ?‘™đ?‘’ đ?‘ đ?‘?đ?‘Žđ?‘?đ?‘’), then we can say that the events A and B are collective exhaustive events.

Sample space: The set or collection of all possible outcomes of a random experiment is known as sample space. Every event is a subset of the sample space. It is usually denoted by capital letter. Example: If we toss a coin and if ‘H’ stand for a head and ‘T’ stand for tail, then the sample space ‘S’ for the possible experimental outcomes may be written as, đ?‘† = {đ??ť, đ?‘‡} and for die, S={1, 2, 3, 4, 5, 6}.

Sample point: Each and every possible outcome in a sample space is called sample point. Example: If we toss a coin and if ‘H’ stand for a head and ‘T’ stand for tail, then the sample point is H and T and for die, each of 1, 2, 3, 4, 5,6 is known as sample point.

50 | p a g e

Department of Finance, Jagannath University

JONY


Complementary events: The complement of an event implies the non-occurrence of the event. An event A and the event đ??´ consisting of all points of the sample space not in A, are called complementary events. Clearly the events A and đ??´ are mutually exclusive and collectively exhaustive. Example: Here đ??´ is the complement of A. Then đ??´ âˆŞ đ??´ = đ?‘† and đ??´ ∊ đ??´ = ∅. And the probability is đ?‘ƒ đ??´ âˆŞ đ??´ = đ?‘ƒ đ?‘† =1 and đ?‘ƒ đ??´ ∊ đ??´ = đ?‘ƒ ∅ =0. đ?‘¨

A Laws of Probability: There are many situations in where the problems related with probability are not so simple. To deal with such a situation we have to know some important laws of probability. a. Additive law: Let A and B are two events then the occurrence of either A or B is, ďƒź P(A or B)/ P(AâˆŞB) = P(A) +P(B) [when both events are mutually exclusive]

A

B

Venn diagram: P(AâˆŞB) = P(A) +P(B) ďƒź P(A or B)/ P(AâˆŞB) = P(A) +P(B) –P(A and B)/ P(A∊B) = P(A) +P(B) –P(AB) [when both events are not mutually exclusive]

A

B đ?‘¨âˆŠđ?‘Š

Venn diagram: P(AâˆŞB) = P(A) +P(B) –P(A∊B) b. Multiplicative law: Let A and B are two independent events, then the occurrence of both A and B successively is, ďƒź P(A and B)/ P(A∊B) = P(A) Ă— P(B) or P(AB)

Conditional probability: When two events are dependent, the concept of conditional probability is employed to designate the probability of occurrence of the related event. If A and B are two events, the probability that A occurs given that B has already occurred is denoted by đ?‘ƒ đ??´/đ??ľ and is called the probability of A given B, we define, đ?‘ƒ(đ??ľ ∊ đ??´) đ?‘ƒ đ??´/đ??ľ = đ?‘ƒ(đ??ľ) Therefore the condition probability of B given A is, 51 | p a g e

Department of Finance, Jagannath University

JONY


đ?‘ƒ(đ??´ ∊ đ??ľ) đ?‘ƒ(đ??´) The expression đ?‘ƒ đ??ľ/đ??´ indicates the probability of event B occurring given that event A has occurred. The expression đ?‘ƒ đ??´/đ??ľ indicates the probability of event A occurring given that event B has occurred. And here đ??ľ/đ??´ and đ??´/đ??ľ are not fraction. đ?‘ƒ đ??ľ/đ??´ =

Independent events: Two events are independent when the occurrence or non-occurrence of one event has no effect on the probability of occurrence of the other event. Example: Two events A and B are said to be independent if and only if đ?‘ƒ đ??´ ∊ đ??ľ = đ?‘ƒ đ??´ đ?‘ƒ(đ??ľ)

Dependent events: Two events are dependent when the occurrence or non-occurrence of one event does affect the probability of occurrence of the other event. Example: Two events A and B are said to be dependent if and only if đ?‘ƒ đ??´ ∊ đ??ľ = đ?‘ƒ đ??´/đ??ľ đ?‘ƒ(đ??ľ) Which implies that, đ?‘ƒ đ??ľ ∊ đ??´ = đ?‘ƒ đ??ľ/đ??´ đ?‘ƒ(đ??´). Example: Suppose that a die is tossed once. There are six possible outcomes -1, 2, 3, 4, 5, 6. Let us define an event A and B such that, A: {Odd number}; B:{A number greater than 3} Solution: Hence the sample space is, S ={1, 2, 3, 4, 5, 6}. ∴ A = {1, 2, 5}

∴ P(A) =

∴ B = {4, 5, 6}

∴ P(B) =

đ?‘›(đ??´) đ?‘›(đ?‘†) đ?‘›(đ??ľ) đ?‘›(đ?‘†)

3

1

6 3

2 1

6

2

= = . Ans. = = . Ans.

Example: A coin is tossed two times or two coins are tossed once and the squares of head s and tails recorded. Set up the sample space and define the following events. A= {At least one head}; B= {One tail}; C= {Two heads}; D={Two heads or Two tails}. Hence find the probabilities of these events and define the following two events and its probabilities: đ??´ ∊ đ??ľ and đ??´ âˆŞ đ??ľ.

Solution: The sample space is shown below, H T ∴ S = {HH, HT, TH, TT} ∴ Events: A = {HH, HT, TH}; ∴ Calculation of probability: 52 | p a g e

H HH TH

B = {HT, TH};

T HT TT

C = {HH};

D = {HH, TT}.

Department of Finance, Jagannath University

JONY


đ?‘ƒ đ??´ =

đ?‘›(đ??´) đ?‘›(đ?‘†)

3

= ;

đ?‘ƒ đ??ľ =

4

∴ đ??´ ∊ đ??ľ = {HT, TH}

đ?‘›(đ??ľ)

2

1

4

2

= = ;

�(�)

đ?‘ƒ đ??ś =

2

1

4

2

đ?‘›(đ??ś) đ?‘›(đ?‘†)

1

= ; 4

đ?‘ƒ đ??ˇ =

đ?‘›(đ??ˇ) đ?‘› (đ?‘†)

2

1

4

2

= =

.

∴ P(đ??´ ∊ đ??ľ) = = .

∴ đ?‘ƒ(đ??´ âˆŞ đ??ľ) = đ?‘ƒ đ??´ + đ?‘ƒ đ??ľ + đ?‘ƒ đ??´ ∊ đ??ľ 3

1

1

4

2

2

= + −

3

= Ans. 4

Example: Two coins are tossed once. Set up the sample space and define the following events. A= {Head on the 1st coin}; B= {Head on the 2nd coin}; đ??´ ∊ đ??ľ = {both coins will turn up heads}. Are these events independent. Solution: The sample space is shown below, H T

H HH TH

T HT TT

∴ S = {HH, HT, TH, TT} ∴ Events: A = {HH, HT}; B = {HH, TH}. ∴ đ??´ ∊ đ??ľ = {HH} ∴ Calculation of probability: đ?‘ƒ đ??´ =

đ?‘›(đ??´) đ?‘›(đ?‘†)

2

1

4

2 n(đ??´âˆŠđ??ľ)

= = ;

Here, P(đ??´ ∊ đ??ľ)=

�(�) 1 1

đ?‘ƒ đ??ľ = =

Also, P(A).P(B) = Ă— = 2

2

đ?‘›(đ??ľ) đ?‘›(đ?‘†)

2

1

4

2

= = .

1 4 1 4

Therefore we can say that, P(đ??´ ∊ đ??ľ) = P(A).P(B) , that means this events are independent. Example: A coin is tossed two times. Construct he sample space and find the probability of the event A at least one head and also find complement of A and its probability. Solution: The sample space is shown below, H T H HH HT T TH TT ∴ S = {HH, HT, TH, TT} ∴ Events: A = {at least one head}; A = {HH, HT}; Complement of the event A is đ??´. ∴ đ??´ = {no head appears} = {TT} ∴ đ?‘ƒ(đ??´) = 53 | p a g e

đ?‘›(đ??´) đ?‘›(đ?‘†)

3

= . 4

Department of Finance, Jagannath University

JONY


3

1

4

4

∴ đ?‘ƒ(đ??´) = 1 − đ?‘ƒ đ??´ = 1 − = . Example: Two distinct dice are tossed together once and the numbers on their faces recorded describe the sample space and find the probabilities of the following events, A= {the sum of the two dice is six}; B= {both die show the same number}; C= { the sum of the two dice is 9}; Solution: The sample space is shown below,

2nd die

1 2 3 4 5 6

1 (1,1) (2,1) (3,1) (4,1) (5,1) (6,1)

2 (1,2) (2,2) (3,2) (4,2) (5,2) (6,2)

1st die 3 (1,3) (2,3) (3,3) (4,3) (5,3) (6,3)

4 (1,4) (2,4) (3,4) (4,4) (5,4) (6,4)

5 (1,5) (2,5) (3,5) (4,5) (5,5) (6,5)

6 (1,6) (2,6) (3,6) (4,6) (5,6) (6,6)

∴ Events: A = {(1,5), (2,4), (3,3), (4,2), (5,1)}

∴đ?‘ƒ đ??´ =

B = {(1,1), (2,2), (3,3), (4,4), (5,5), (6,6)}

∴ đ?‘ƒ đ??ľ =

C = {(3,6), (4,5), (5,4), (6,3)}

∴ đ?‘ƒ đ??ś =

đ?‘›(đ??´) đ?‘›(đ?‘†) đ?‘›(đ??ľ) đ?‘›(đ?‘†) đ?‘›(đ??ś) đ?‘›(đ?‘†)

= = =

5 36 6 36 4 36

. Ans. 1

= . Ans. 6 1

= . Ans. 9

Example: Suppose that a bag contains 10 white and 5 red balls. Two ball are drawn at random from the bag without replacement. Find the probability that the 1st ball is read and 2nd ball is white. Solution: Let us define R for red ball and W for white ball. ∴ đ?‘… = {đ?‘?đ?‘Žđ?‘™đ?‘™ đ?‘–đ?‘ đ?‘&#x;đ?‘’đ?‘‘}; ∴ đ?‘Š = {đ?‘?đ?‘Žđ?‘™đ?‘™ đ?‘–đ?‘ đ?‘¤đ?‘•đ?‘–đ?‘Ąđ?‘’} Here, P(R) =

5 15

∴P(W/R) =

1

=

3

10 14

5

= ; [ though 1 ball have already been taken so 15-1= 14] 7

∴P(R∊W) = P(R). P(W/R) 1

5

5

3

7

21

= Ă— =

. Ans.

Md. Mazharul Islam (Jony). Roll no: 091541, 3rd Batch. Department of Finance. Jagannath University.

54 | p a g e

Department of Finance, Jagannath University

JONY


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.