Virtual University of Pakistan Lecture No. 23 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah
IN THE LAST LECTURE, YOU LEARNT •Bayes’ Theorem •Discrete Random Variable • Discrete Probability Distribution •Graphical Representation of a Discrete Probability Distribution •Mean, Standard Deviation and Coefficient of Variation of a Discrete Probability Distribution •Distribution Function of a Discrete Random Variable.
TOPICS FOR TODAY •Graphical Representation of the Distribution Function of a Discrete Random Variable •Mathematical Expectation •Mean, Variance and Moments of a Discrete Probability Distribution •Properties of Expected Values
First,
let us consider the concept of the DISTRIBUTION FUNCTION of a discrete random variable. As discussed in the last lecture:
DISTRIBUTION FUNCTION The distribution function of a random variable X, denoted by F(x), is defined by F(x) = P(X < x). The function F(x) gives the probability of the event that X takes a value LESS THAN OR EQUAL TO a specified value x. The distribution function is abbreviated to d.f. and is also called the cumulative distribution function (cdf) as it is the cumulative probability function of the random variable X from the smallest value upto a specific value x.
EXAMPLE Find the probability distribution and distribution function for the number of heads when 3 balanced coins are tossed. Depict both the probability distribution and the distribution function graphically.
Since the coins are balanced, therefore the equiprobable sample space for this experiment is S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}.
Let X be the random variable that denotes the number of heads. Then the values of X are 0, 1, 2 and 3.
And their probabilities are: f(0) f(1) f(2) f(2)
= P(X = 0) = P[{TTT}] = 1/8 = P(X = 1) = P[{HTT, THT, TTH}] = 3/8 = P(X = 2) = P[{HHT, HTH, THH}] = 3/8 = P(X = 3) = P[{HHH}] = 1/8
Expressing the above information in the tabular form, we obtain the desired probability distribution of X as follows:
Number of Heads Probability (xi) f(xi) 0 1 2 3 Total
1 8 3 8 3 8 1 8 1
The line chart of the above probability distribution is as follows:
f(x) 4/8 3/8 2/8 1/8 0
0
1
2
3
X
In order to obtain the distribution function of this random variable, we compute the cumulative probabilities as follows:
Number of Heads (xi) 0 1 2 3
Probability f(xi)
1 8 3 8 3 8 1 8
Cumulative Probability F(xi)
1 8 1 3 4 + = 8 8 8 4 3 7 + = 8 8 8 7 1 + =1 8 8
Hence the desired distribution function is
0, 1 , 8 4 F( x ) = , 8 7 , 8 1,
for x < 0 for 0 ≤ x < 1 for 1 ≤ x < 2 for 2 ≤ x < 3 for x ≥ 3
INTERPRETATION: If x < 0, we have P(X < x) = 0, the reason being that it is not possible for our random variable X to assume value less than zero. (The minimum number of heads that we can have in tossing three coins is zero.) If 0 < x < 1, we note that it is not possible for our random variable X to assume any value between zero and one. (We will have no head or one head but we will NOT have 1/3 heads or 2/5 heads!) Hence, the probabilities of all such values will be zero, and hence we will obtain a situation which can be explained through the following table:
Number of Heads (xi)
Probability f(xi)
0
1 8
0.2
0
0.4
0
0.6
0
0.8
0
1
3 8
Cumulative Probability F(xi)
1 8 1 1 +0 = 8 8 1 1 +0 = 8 8 1 1 +0 = 8 8 1 1 +0 = 8 8 1 3 4 + = 8 8 8
The above table clearly shows that the probability that X is LESS THAN any value lying between zero and 0.9999â&#x20AC;Ś will be equal to the probability of X = 0 i.e. For 0 < x < 1,
1 P(X < x) = P(X = 0) = ; 8
Similarly, •For 1 < x < 2, we have
P( X < x ) = P( X = 0 ) + P( X = 1) 1 3 4 = + = ; 8 8 8 •For 2 < x < 3, we have
P( X < x ) = P( X = 0 ) + P( X = 1) + P( X = 2 ) 1 3 3 7 = + + = ; 8 8 8 8
And, finally, for x > 3, we have P( X < x ) = P( X = 0 ) + P( X = 1) + P( X = 2 ) + P(X = 3) 1 3 3 1 8 = + + + = = 1. 8 8 8 8 8
Hence, the graph of the DISTRIBUTION FUNCTION is as follows:
F(x) 1 6/8 4/8 2/8 0
1
2
3
X
As this graph resembles the steps of a staircase, it is known as a step function. It is also known as a jump function (as it takes jumps at integral values of X).
In some books, the graph of the distribution function is given as shown in the following figure:
F(x) 1 6/8 4/8 2/8 0
1
2
3
X
In what way do we interpret the above distribution function from a REAL-LIFE point of view? If we toss three balanced coins, the probability that we obtain at the most one head is 4/8, the probability that we obtain at the most two heads is 7/8, and so on.
Let us consider another interesting example to illustrate the concepts of a discrete probability distribution and its distribution function:
EXAMPLE A large store places its last 15 clock radios in a clearance sale. Unknown to any one, 5 of the radios are defective. If a customer tests 3 different clock radios selected at random, what is the probability distribution of X, where X represent the number of defective radios in the sample?
SOLUTION We have:
Type of Clock Radio Good Defective Total
Number of Clock Radios 10 5 15
The total number of ways of selecting 3 radios out of 15 is
15 . 3
Also, the total number of ways of selecting 3 good radios (and no defective radio) 10 5 is . 3 0
Hence, the probability of X = 0 is
10 5 3 0 = 0.26. 15 3
The probabilities of X = 1, 2, and 3 are computed in a similar way. Hence, we obtain the following probability distribution:
Number of defective clock radios in the sample X 0 1 2 3 Total
Probability f(x) 0.26 0.49 0.22 0.02 0.99 â&#x2030;&#x2C6;1
The line chart of this distribution is:
LINE CHART f(x) 0.5 0.4 0.3 0.2 0.1 0
0
1
2
3
X
As indicated by the above diagram, it is not necessary for a probability distribution to be symmetric; it can be positively or negatively skewed.
The distribution function of the above probability distribution is obtained as follows:
Number of defective clock radios in the sample X 0 1 2 3 Total
f(x)
F(x)
0.26 0.49 0.22 0.02 0.99 ≈ 1
0.26 0.75 0.97 0.99 ≈ 1
INTERPRETATION The probability that the sample of 3 clock radios contains at the most one defective radio is 0.75, the probability that the sample contains at the most two defective radios is 0.97, and so on.
Next, we consider the concept of MATHEMATICAL EXPECTATION.
Let a discrete random variable X have possible values x1, x2, …, xn with corresponding probabilities f(x1), f(x2), …, f(xn) such that Σf(xi) =1. Then the mathematical expectation or the expectation or the expected value of X, denoted by E(x), is defined as28
E(X) = x1f(x1) + x2f(x2) + … + xnf(xn) n
= ∑ x if ( x i ) , i =1
E(X) is also called the mean of X and is usually denoted by the letter µ.
The expression n
E( X ) = ∑ x if ( x i ) i =1
may be regarded as a weighted mean of the variable’s possible values x1, x2, …, xn, each being weighted by the respective probability.
In case the values are equally likely, 1 E( X ) = â&#x2C6;&#x2018; x i , n
which represents the ordinary arithmetic mean of the n possible values.
It should be noted that E(X) is the average value of the random variable X over a VERY LARGE number of trials.
Let us now consider an interesting example:
EXAMPLE If it rains, an umbrella salesman can earn $ 30 per day. If it is fair, he can lose $ 6 per day. What is his expectation if the probability of rain is 0.3?
SOLUTION Let X represent the number of dollars the salesman earns. Then X is a random variable with possible values 30 and â&#x20AC;&#x201C;6, (where -6 corresponds to the fact that the salesman loses), and the corresponding probabilities are 0.3 and 0.7 respectively. Hence, we have:
EVENT Rain No Rain
AMOUNT EARNED ($) x 30 –6 Total
PROBABILITY P(x) 0.3 0.7 1
In order to compute the expected value of X, we carry out the following computation:
EVENT Rain No Rain
AMOUNT EARNED ($) x 30 –6 Total
PROBABILITY P(x)
xP(x)
0.3 0.7 1
9.0 -4.2 4.8
Hence E(X) = $ 4.80 per day i.e. on the average, the salesman can expect to earn 4.8 dollars per day.
Until now, we have considered the mathematical expectation of the random variable X. But, in many situations, we may be interested in the mathematical expectation of some FUNCTION of X:
EXPECTATION OF A FUNCTION OF A RANDOM VARIABLE Let H(X) be a function of the random variable X. Then H(X) is also a random variable and also has an expected value, (as any function of a random variable is also a random variable).
If X is a discrete random variable with probability distribution f(x), then, since H(X) takes the value H(xi) when X = xi, the expected value of the function H(X) is E[H(X)] = H(x1) f(x1) + H(x2)f(x2) + â&#x20AC;Ś + H(xn) f(xn) provided the series converges absolutely.
= â&#x2C6;&#x2018; H( x i ) f ( x i ) , i
In particular, if H(X) = X2, then E(X2) = ÎŁxi2 f(x). It is relevant to note that E(X2) is not the same as [E(X)]2.
Again, if H(X) = (X - µ)2, where µ is the population mean, then E(X – µ)2 = Σ(xi - µ)2 f(x). We call this expected value the variance and denote it by Var(X) or σ2.
And, since E(X – µ)2 = E(X2) – [E(X)]2, hence the short cut formula for the variance is σ 2 = E(X2) – [E(X)]2. The positive square root of the variance, a before, is called the standard deviation.
More generally, if H(X) = Xk, k = 1, 2, 3, …, then E(Xk) = Σxik f(x) which we call the kth moment about the origin of the random variable X and we denote it by µ′k.
Similarly, if H(X) = (X – µ)k, k = 1, 2, 3, …, then we get an expected value, called the kth moment about the mean of the random variable X, which we denote by µk. That is: µk = E(X – µ)k = Σ(xi – µ)k f(x)
The skewness of a probability distribution is often measured by
β1 = and kurtosis by
2 µ3 3 µ2
β2 =
µ4 µ2
. 2
These moment-ratios assist us in determining the skewness and kurtosis of our probability distribution in exactly the same way as was discussed in the case of frequency distributions.
Next, we discuss some important properties of mathematical expectation. The important properties of the expected values of a random variable are as follows:
PROPERTIES OF MATHEMATICAL EXPECTATION 1. If c is a constant, then E(c) = c. Thus the expected value of a constant is constant itself.
This point can be understood easily by considering the following interesting example: Suppose that a very difficult test was given to students by a professor, and that every student obtained 2 marks out of 20! It is obvious that the mean mark is also 2. Since the variable â&#x20AC;&#x2DC;marksâ&#x20AC;&#x2122; was a constant, therefore its expected value was equal to itself.
2. If X is a discrete random variable and if a and b are constants, then E(aX + b) = a E(X) + b.
Let us verify this from the following example:
EXAMPLE Let X represent the number of heads that appear when three fair coins are tossed. The probability distribution of X is: X 0 1 2 3 Total
P(x) 1/8 3/8 3/8 1/8 1
The expected value of X is obtained as follows: x 0 1 2 3 Total
P(x) 1/8 3/8 3/8 1/8 1
Hence, E(X) = 1.5
xP(x) 0 3/8 6/8 3/8 12/8=1.5
Suppose that we are interested in finding the expected value of the random variable 2X+3. Then we carry out the following computations:
x 0 1 2 3
2x+3 3 5 7 9 Total
P(x) 1/8 3/8 3/8 1/8 1
Hence E(2X+3) = 6
(2x+3)P(x) 3/8 15/8 21/8 9/8 48/8=6
It should be noted that E(2X+3) = 6 = 2(1.5) + 3 = 2E(X) + 3 i.e. E(aX + b) = a E(X) + b.
IN TODAY’S LECTURE, YOU LEARNT
•Graphical Representation of the Distribution Function of a Discrete Random Variable •Mathematical Expectation •Mean, Variance and Moments of a Discrete Probability Distribution •Properties of Expected Values
IN THE NEXT LECTURE, YOU WILL LEARN
•Chebychev’s Inequality •Continuous Probability Distributions