Simple Linear Regression from Blumen in Chinese

Page 1

10

qxd

9/16/

Page

0 1 10

533

10

82_ch

lu385

590. _533-

AM 11:33

E R P T A C H

Chapter

nd a n o i lat Corre sion s Regre

ne 相關與迴歸 Outli

y RF.

© Gett

uction on Introd rrelati o C d n Plots a catter S 1 – 10 Stan sion n and Regres o i t 2 a – n i 0 1 Determ ent of mate i c i f f e Co sti 10–3 f the E al) Error o Option ( n o i s egres ltiple R u M 4 10– ary Summ

able to uld be o h s u s. o pter, y O ed pair a r h e c d r s i o ting th set of omple t for a lo p r e t After c ient. scat coeffic n raw a io t D la re 1 he cor � 0. pute t m o C H 0: r n line. 學習目標 2 is s e gressio poth e y r h e e h t h tt n of 3 Tes ation. equatio e h t etermin e 經過這一章的洗禮之後,你將具有以下的能力: t d u f p o t m n te. 4 Co oefficie 浺 estima c e e h t h t f 毣 為一組成對數據繪製一張散佈圖。 計算決定係數。 o r mpute d erro 5 Co tandar 浣 s e h t 毢 計算相關係數。 ute l. 計算估計的標準誤。 Comp interva n 6 ltiple io t ic 毧 檢定這項假設 H0:ρ = 0。a pred 浤 e求出預測區間。 t of mu p c d n o 7 Fin h the c 氥 計算迴歸直線的方程式。 iar wit il m a f e 8 Begression. r

i ve s bject

本章大綱 簡介 10-1 散佈圖與相關 10-2 迴歸

10-3 決定係數與估計的標準誤 結語


 統計學

�� 簡介 在第 7 章和第 8 章,解釋了推論統計的兩項領域 信賴區間和假設檢 定。另一項統計推論領域涉及判斷兩個以上數值或屬量變數是否存在某種關 係。比如說,商人可能想知道某一個月的銷售業績是否和那一個月公司投入多 少廣告有關。教育學家有興趣判斷花多少小時念書是否和該科的成績有關。醫 藥研究人員有興趣問,咖啡因和心臟病的關係?或是年紀和血壓的關係?動物 學家可能想知道某一種動物的新生兒體重和壽命的關係。有許多可以用相關或 是迴歸回答的問題,這些只是其中一小部分。相關 (correlation) 是一種統計方 法,用來決定變數間是否有線性關係。迴歸 (regression) 是一種統計方法,用來 描述變數間關係的本質,也就是說,到底是正的還是負的,線性還是非線性。 用統計觀點回答下列問題是本章的目標: 1. 兩個或以上的變數之間有線性關係嗎? 2. 如果有,關係的強度有多少? 3. 存在哪一種關係? 4. 可以從關係進行哪一類的預測? 非凡數字 人的一生大概走了 100,000 英哩,也 就是每天約走 3.4 英哩。

為了回答前兩個問題,統計學家使用一種數值測度,決定兩個變數是否有 線性關係,並且決定變數間線性關係的強度。這一項測度叫做相關係數。比如 說,有許多變數和心臟病有關,諸如缺乏運動、抽菸、遺傳、年紀、壓力和飲 食。這些變數之中有些比其他重要;因此,醫生希望能夠幫助病人認識哪一些 變數最重要。 為了回答第三個問題,你必須先確定存在哪一種關係。有兩種關係:簡單 關係與複關係。在簡單關係 (simple relationship) 裡,有兩個變數,一個是獨 立變數 (independent variable),也叫做解釋變數或是預測變數,而第二個變數 叫做依變數 (dependent variable),也叫做反應變數。有一種簡單關係分析叫 做簡單迴歸,它有一個被用來預測依變數的獨立變數。比如說,一位經理想要 知道業務的年資是否對業績有任何幫助。這一類的研究涉及一種簡單關係,因 為只有兩個變數:年資和業績。 有一種複關係 (multiple relationship) 叫做複迴歸(multiple regression), 用兩個以上的獨立變數來預測一個依變數。比如說,有一位教育學家可能想要 研究大學成就和花多少時間念書、GPA 和高中背景等因素的關係。這一類的 研究涉及數個變數。 簡單關係可以是正的,也可以是負的。當兩變數同時增加或同時減少,存

468


average 100,000 miles in his or her lifetime. This is about 3.4 miles per day.

of exercise, smoking, heredity, age, stress, and diet. Of these variables, some are more important than others; therefore, a physician who wants to help a patient must know which factors are most important. To answer the third question, you must ascertain what type of relationship exists. There are two types of relationships: simple and multiple. In a simple relationship, there are two variables—an independent variable, also called an explanatory variable or a predictor variable, and a dependent variable, also called a response variable. A simple relationship analysis is called simple regression, and there is one independent variable that 相關與迴歸 is used to predict the dependent variable. For example, a manager may wish to see whether the number of years the salespeople have been working for the company has anything to do with the amount of sales they make. This type of study involves a simple relationship, 在一種正關係 relationship)。比如說,身高與體重是有關係的;而且 since there are(positive only two variables—years of experience and amount of sales. In a multiple relationship, called multiple regression, two or more independent 關係是正的,因為一般而言,身高愈高的人,體重愈重。當一個變數增加,另 variables are used to predict one dependent variable. For example, an educator may wish to investigate the relationship between a student’s success in college and factors such 一個變數減少,或是反過來,則存在一種負關係 (negative relationship)。比如 as the number of hours devoted to studying, the student’s GPA, and the student’s high 說,如果測量年紀超過 60 歲民眾的力氣,你會發現年紀愈大,一般而言力氣 school background. This type of study involves several variables. Simple relationships can also be positive or negative. A positive relationship exists 愈小。在這裡使用「一般」這樣的字眼是因為會有例外。 when both variables increase or decrease at the same time. For instance, a person’s height and weight are related; and the relationship is positive, since the taller a person is, gen最後,第四個問題問到可以進行哪一種形式的預測。所有領域每天都有預 erally, the more the person weighs. In a negative relationship, as one variable increases, 測,包括氣象預測、股市預測、業績預測、收成預測、油價預測和運動賽事預 the other variable decreases, and vice versa. For example, if you measure the strength of people over 60 years of age, you will find that as age increases, strength generally 測。有些預測比較準確,因為關係比較強。也就是說,變數關係愈強,預測就 decreases. The word generally is used here because there are exceptions. Finally, the fourth question asks what type of predictions can be made. Predictions are 愈準確。 made in all areas and daily. Examples include weather forecasting, stock market analyses, sales predictions, crop predictions, gasoline price predictions, and sports predictions. Some predictions are more accurate than others, due to the strength of the relationship. That is, 10-1 the stronger the relationship is between variables, the more accurate the prediction is. 學習目標 毣

10

散佈圖與相關

10–1 Objective

1

在簡單相關與迴歸研究中,研究員會收集兩數值或屬量變數的數據,藉以 Scatter Plots and Correlation 求出兩變數間是否有某種關係。比如說,有一位研究員希望知道花多少時間念 In simple correlation and regression studies, the researcher collects data on two numeri-

為一組成對數據繪 製一張散佈圖。

cal or quantitative variables to see whether a relationship exists between the variables.

書和某一次考試成績的關係,她必須收集一組學生的隨機樣本,決定每一位學

Draw a scatter plot for For example, if a researcher wishes to see whether there is a relationship between number of hours of study and test scores on an exam, she must select a random sample of a set of ordered pairs.生的念書時數,以及取得每一位學生該科的考試成績。為數據做個表格,如下

students, determine the hours each studied, and obtain their grades on the exam. A table

所示。 can be made for the data, as shown here.

Hours of

學生 Student

念書時數 study x x

成績 yy(%) Grade (%)

A B C D E F

6 2 1 5 2 3

82 63 57 88 68 75

如前述,這一項研究的兩個變數稱為獨立變數和依變數。迴歸裡的獨立變 10–3 數是可以被控制或操作的變數。這時候,念書時數是獨立變數,記作 x 變數。 迴歸裡的依變數是無法被控制或操作的變數。學生的考試成績是依變數,記 作 y 變數。這樣區別變數的原因在於假設學生的考試成績是根據學生的念書時 數。同時,某種程度上,我們也假設學生可以因應考試安排念書時數。 決定哪一個變數是 x 變數,哪一個變數是 y 變數不會都如此明確,有時候 是任意決定的。比如說,如果有一位研究員研究年紀對血壓的效果。一般而 言,研究員會假設年紀影響血壓。因此年紀這一個變數被認為是獨立變數,而 血壓變數就會被認為是依變數。另一方面,如果研究夫妻對某一件事的態度, 這時候決定誰的態度是獨立變數、誰的態度是依變數是很困難的。這時候,研 究員可能會任意決定這一件事。 獨立變數與依變數可以被畫在一張圖上,這張圖叫做散佈圖。獨立變數 x 469


earns depends on the number of hours the student studied. Also, you assume that, to some extent, the student can regulate or control the number of hours he or she studies for the exam. The determination of the x and y variables is not always clear-cut and is sometimes an arbitrary decision. For example, if a researcher studies the effects of age on a person’s blood pressure, the researcher can generally assume that age affects blood pressure. Hence, the variable age can be called the independent variable, and the variable blood pressure can be called the dependent variable. On the other hand, if a researcher is study 統計學 ing the attitudes of husbands on a certain issue and the attitudes of their wives on the same issue, it is difficult to say which variable is the independent variable and which is the dependent variable. In this study, the researcher can arbitrarily designate the variables as independent and dependent. x 軸(橫軸)上,而依變數 y 是位在圖的 y 軸(縱軸)上。 是位在圖的 The independent and dependent variables can be plotted on a graph called a scatter plot. The independent variable x is plotted on the horizontal axis, and the dependent variable y is plotted on the vertical axis.

散佈圖 (scatter plot) 是把獨立變數 x 和依變數 y 配成有序對 (x, y),然後

A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the 把每一對看做是二維平面上的一點,再描點繪圖。 independent variable x and the dependent variable y.

The scatter plot is a visual way to describe the nature of the relationship between the 散佈圖是一種視覺工具,可以用來描述獨立變數與依變數關係的本質。變 independent and dependent variables. The scales of the variables can be different, and 數的單位可以不一樣,而且利用個別變數的最大值和最小值決定個別座標軸的 the coordinates of the axes are determined by the smallest and largest data values of the variables. 範圍。 The procedure for drawing a scatter plot is shown in Examples 10–1 through 10–3.

繪製散佈圖的程序顯示在例題 10-1 至例題 10-3。 Example 10–1 Rental Companies 例題 10-1Car租車公司 Construct a scatter plot for the data shown for car rental companies in the United 為以下美國租車公司的數據建構一張散佈圖。 States for a recent year. Company 公司

Cars (in ten thousands) 車輛數(以萬輛計)

Revenue (in billions) 收益(以十億美元計)

A B C D E F

63.0 29.0 20.8 19.1 13.4 8.5

$7.0 3.9 2.1 2.8 1.4 1.5

Source: Auto Rental News.

資料來源:Auto Rental News.

■解答 步驟 1

Solution Step 1

Draw and label the x and y axes.

Step 2

Plot each point on the graph, as shown in Figure 10–1.

畫出 x 軸和 y 軸,並且加上標示。 步驟 2 在圖上描點,如圖 10-1 所示。 圖 10-1

收益︵十億美元︶

例題 10-1 的散 佈圖

7.75

Revenue (billions)

10–4

y

6.50 5.25 4.00 2.75 1.50 x 8.5

17.5

26.5

35.5

Cars (in 10,000s) 車輛數(萬輛)

470

44.5

53.5

62.5


Revenue

4.00 2.75 1.50 x 8.5

17.5

26.5

35.5

44.5

53.5

62.5

相關與迴歸

Cars (in 10,000s)

例題 10-2

Example 10–2

10

缺席與期末成績

Absences and Final Grades Construct a scatter plot for the data obtained in a study on the number of absences 從一份缺席次數與統計學期末成績的研究取得以下這一組隨機樣本,為數據繪製一張散佈 and the final grades of seven randomly selected students from a statistics class. 圖。 The data are shown here. 學生 Student

x 缺席次數 Number of absences x

期末成績 Final gradeyy(%) (%)

A B C D E F G

6 2 15 9 12 5 8

82 86 43 74 58 90 78

Solution

■解答

Draw and label the x and y axes.

Step 1

步驟 1

Plot each point on the graph, as shown in Figure 10–2. 畫出 x 和 y 軸,並且加上標示。

步驟 2

Scatter Plot for Example 10–2

y

100 在圖上描點,如圖 10-2 所示。 90

y

100

Final grade

Figure 10–2

Step 2

90

Final grade

期末成績

80

80

圖 10-2

70

例題 10-2 的散 佈圖

60 50

70

40

60

30 x

50 0

40

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Number of absences

30 x 0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

10–5

Number of absences 缺席次數

例題 10-3

年紀與財產

有一位研究員希望知道美國有錢人的年紀和財產之間是不是有關係。某一年的數據如下所 示。 資料來源:Forbes magazine.

471


82_ch10_533-590.qxd

538

9/13/10

2:17 PM

Page 538

Chapter 10 Correlation and Regression

 統計學

Example 10–3

Age and Wealth A researcher wishes to see if there is a relationship between the ages and net worth of the wealthiest people in America. The data for a specific year are shown. 人 Person

年紀xx Age

y(以十億美元計) 財產 Net wealth y ($ billions)

A B C D E F G H

73 65 53 54 79 69 61 65

16 26 50 21.5 40 16 19.6 19

Source: Forbes magazine.

■解答

Solution

步驟 1

Step 1 Draw and label the x and y axes. x 軸和2 y 軸,並且加上標示。 畫出 Step Plot each point on the graph, as shown in Figure 10–3.

步驟 2 50

y

在圖上描點,如圖 10-3 所示。

Scatter Plot for Example 10–3

40 30 20 10

財產︵十億美元︶

例題 10-3 的散 佈圖

50

Wealth ($ billions)

圖 10-3

Wealth ($ billions)

Figure 10–3

y

40 30 20

50

10

x 60

70

80

Age

After the plot is drawn, it should be analyzed to determine which type of relationship, x if any, exists. For example, the50plot shown in Figure 60 10–1 suggests70a positive relationship, 80 since as the number of cars rented increases, revenue 年紀 tends Age to increase also. The plot of the data shown in Figure 10–2 suggests a negative relationship, since as the number of absences increases, the final grade decreases. Finally, the plot of the data shown in Figure 10–3 shows no specific type of relationship, since no pattern is discernible. Note that畫完圖之後,透過分析決定可能是哪一種關係,如果關係存在的話。比如 the data shown in Figures 10–1 and 10–2 also suggest a linear relationship, since the points seem to fit a straight line, although not perfectly. Sometimes a scatter 說,圖 plot, such as the 10-1 one in的散佈圖顯現一種正關係,因為出租車數增加,收益也增加。圖 Figure 10–4, shows a curvilinear relationship between the data. In this situation, the methods shown in this section and in Section 10–2 cannot be used. 10-2 的散佈圖則是顯現一種負關係,因為缺席次數增加,期末成績降低了。最 Methods for curvilinear relationships are beyond the scope of this book.

後,圖 10-3 的散佈圖看不出來有什麼特定的關係,因為圖上無明顯的模式。

Correlation Objective

2

Compute the correlation coefficient. 10–6

10-1 As 和圖 10-2 也顯現一種線性關係,因為圖上的點和某一條線 Correlation注意,圖 Coefficient stated in the Introduction, statisticians use a measure called the correlation coefficient to determine the strength of the linear relationship 段看起來相當符合,雖然不是完全符合。有時候,像圖 10-4 的散佈圖會顯示 between two variables. There are several types of correlation coefficients. The one

數據間的某種曲線關係。這時候,本節和第 10-2 節所介紹的方法並不適用。 曲線關係的方法不在本書的範圍。

472


相關與迴歸

y

10

圖 10-4 散佈圖顯現一種 曲線關係

x

�� 相關 相關係數 如簡介所述,統計學家用一種叫做「相關係數」的測度決定兩變數

學習目標 毢

的線性關係強度。相關係數有好幾種。本節要解釋的叫做 Pearson 動差相關係

計算相關係數。

數 (Pearson product moment correlation coefficient, PPMC),根據這個領域的 研究先鋒 Karl Pearson 命名。 相關係數 (correlation coefficient) 是一個從樣本數據得到且用來測量兩屬 量變數的線性關係強度與方向的數字。樣本相關係數的符號是 r。母體相 關係數的符號是 ρ (希臘字母 rho)。 相關係數的範圍是從 − 1 到 + 1。如果變數間有某種強烈的正線性關係, r 的數值會接近 + 1。如果變數間有某種強烈的負線性關係,r 的數值會接近 − 1。當變數間沒有線性關係,或是只有微弱的線性關係,r 的數值會接近 0。 見圖 10-5。 圖 10-6 內的圖形顯示相關係數與對應的散佈圖。注意,當相關係數從 0 漸增到 + 1((a)、(b)、(c) 小圖),數據愈來愈靠近某種強烈的正線性關係。 當相關係數從 0 漸減到 − 1((d)、(e)、(f) 小圖),數據愈來愈靠近某種強烈的 負線性關係。這再一次顯現某種強烈的關係。 計算相關係數的數值有數種方式,其中一種就是使用以下的公式。 強烈負線性關係 –1

無線性關係 0

強烈正線性關係 +1

圖 10-5 相關係數的數值 範圍

473


x

 統計學

explained in this section is called the Pearson product moment correlation coefficient (PPMC), named after statistician Karl Pearson, who pioneered the research in this area. y

圖 10-6

y

y

The correlation coefficient computed from the sample data measures the strength and direction of a linear relationship between two quantitative variables. The symbol for the sample correlation coefficient is r. The symbol for the population correlation coefficient is r (Greek letter rho).

相關係數與散佈 圖之間的關係

(a) r = 0.50 y

(d) r = –0.50

The range of the correlation coefficient is from �1 to �1. If there is a strong positive x x x linear relationship between the variables, the value of r will be close to �1. If there is a (b) r = 0.90 (c) r = 1.00 strong negative linear relationship between the variables, the value of r will be close to �1. When there is the variables or only a weak relationy no linear relationship between y ship, the value of r will be close to 0. See Figure 10–5. The graphs in Figure 10–6 show the relationship between the correlation coefficients and their corresponding scatter plots. Notice that as the value of the correlation coefficient increases from 0 to �1 (parts a, b, and c), data values become closer to an increasingly strong relationship. As the value of the correlation coefficient decreases from 0 to �1 (parts d, e, and f ), the data values also become closer to a straight line. Again this sugx x x gests a stronger relationship. There are several ways to compute the = –1.00 of the correlation coefficient. One (e) r = –0.90 (f) rvalue method is to use the formula shown here.

相關係數 r 的公式

Formula for the Correlation Coefficient r r�

n��xy� � � �x���y� 2[n��x � ��x� 2][n� �y 2 � � � �y� 2] 2�

where n is the number of data pairs. 其中 n 是成對數據的個數。

相關係數的假設 Figure 10–5

Strong negative linear relationship

No linear relationship

1. for 樣本是隨機樣本。 Range of Values the Correlation Coefficient 2. 成對數據大約落在一條直線上,而且以區間或是比例尺度取得數據。 –1 0

Strong positive linear relationship +1

3. 兩變數是某種聯合常態分配。(這意味著對於任意已知的 x,y 的分配是常態 的;而且對於任意已知的 y,x 的分配是常態的。)

相關係數的四捨五入原則 將 r 四捨五入到 3 位小數。 相關係數的公式看起來有點複雜,但是使用例題 10-4 建議的表格輔助計 算,會讓計算變得比較簡單一點。 相關係數 r 沒有單位,而且如果對調 x 值和 y 值,r 不會改變。

474

10–7


1. 1.The The sample sample is aisrandom a random sample. sample. 2. 2.The The data data pairs pairs fallfall approximately approximately onon a straight a straight lineline andand areare measured measured at the at the interval interval or or ratio ratio level. level. 3. 3.The The variables variables have have a joint a joint normal normal distribution. distribution. (This (This means means thatthat given given anyany specific specific value value of of x, the x, the y values y values areare normally normally distributed; distributed; andand given given anyany specific specific value value of of y, the y, the x values x values areare normally normally distributed.) distributed.)

相關與迴歸

Rounding Rounding Rule Rule for for the the Correlation Correlation Coefficient Coefficient Round Round thethe value value of of r to r to three three decimal decimal places. places. The The formula formula looks looks somewhat somewhat complicated, complicated, butbut using using a table a table to to compute compute thethe values, values, as as shown shown in in Example Example 10–4, 10–4, makes makes it somewhat it somewhat easier easier to to determine determine thethe value value of of r. r. 例題 10-4 There 租車公司 There areare nono units units associated associated with with r, and r, and thethe value value of of r will r will remain remain unchanged unchanged if the if the x and x and y values y values areare switched. switched. 計算例題 10-1 數據的相關係數。

10

Example Example 10–4 10–4 Car Rental Rental Companies Companies ■解答 Car Compute Compute thethe correlation correlation coefficient coefficient forfor thethe data data in in Example Example 10–1. 10–1. 步驟 1 Solution Solution

製作一張如下所示的表格。 Make a table a table as as shown shown here. here. Step Step 1 1 Make 公司 Company Company

收益 y y 車輛數 Cars Cars x x Revenue Revenue (以萬輛計) (in(in tenten thousands) thousands) (以十億美元計) (in(in billions) billions)

AA

lu38582_ch10_533-590.qxd

lu38582_ch10_533-590.qxd

9/13/10

9/13/10

2:17BPM B Page 541

CC

2:17 D DPM

Page 541

EE FF 10–8 10–8

63.0 63.0 29.0 29.0 20.8 20.8 19.1 19.1 13.4 13.4 8.58.5

xyxy

x2 x2

y2 y2

7.07.0 3.93.9 2.12.1 2.82.8 1.41.4 1.51.5 Section 10–1 Scatter Plots and Correlation

步驟 2

541

Section 10–1 Scatter Plots and Correlation 541 求出 xy, x2 和 y2 的數值,並且把結果放在表內適當的行內。 Step 2 Find the values of xy, x2, and y2 and place these values in the corresponding 完成的表格如下所示。 columns of the table. Thethe completed table Step 2 Find values of xy, is x2shown. , and y2 and place these values in the corresponding 車輛數 收益 columns of xthe table. Cars Revenue y (以萬輛計) (以十億美元計) 公司 Company (in (intable billions) x2 y2 The10,000s) completed is shown. xy

A

Company B C DA EB FC

步驟 3

Cars x 63.0 (in 10,000s) 29.0 20.8 63.0 19.1 29.0 13.4 20.8 8.5

Revenue y 441.00 7.0 (in billions) 3.9 113.10xy 2.1 2.8 7.0 1.4 3.9 1.5 2.1

43.68 441.00 53.48 113.10 18.76 12.7543.68

3969.00 2 841.00 x 432.64 3969.00 364.81 841.00 179.56 432.64 72.25

49.00 2 15.21 y 4.41 7.84 49.00 1.96 15.21 2.25 4.41

D 19.1 2.8 53.48 364.81 7.84 �x � 153.8 �y � 18.7 �xy � 682.77 �x2 � 5859.26 �y2 � 80.67 E 13.4 1.4 18.76 179.56 1.96 Step 3F Substitute in the solve for r. 8.5formula and 1.5 12.75 72.25 2.25

2 2 n��xy� � ��x���y� � �x � 153.8 �y � 18.7 �xy � 682.77 �x � 5859.26 �y � 80.67 代入公式解得r r。 2 2 2 2 � � � �x � ][n� �y � � � �y � ] 2[n��x in Step 3 Substitute the formula and solve for r. � 6 �� 682.77 � � � 153.8 �� 18.7 � n��xy� � ��x���y� � 0.982 � r� � � � 153.8 � 2][2� 6 �� 80.67 �2� � 18.7 � 2] 2[�6 ��5859.26 2 2 2[n��x � � ��x� ][n��y � � � �y� ] The correlation coefficient suggests a strong relationship between the � 6 �� 682.77 � � � 153.8 �� 18.7 � number � of cars a rental agency has and its annual revenue. � 0.982 2[�6��5859.26 � � �153.8� 2][�6 ��80.67 � � �18.7� 2]

The correlation coefficient suggests a strong relationship between the number of Grades cars a rental agency has and its annual revenue. Example 10–5 相關係數推測車輛數和收益之間有一種強烈的關係。 Absences and Final

Example 10–5

Compute the value of the correlation coefficient for the data obtained in the study of the number of absences and the final grade of the seven students in the statistics class given in Example Absences and10–2. Final Grades Solution Compute the value of the correlation coefficient for the data obtained in the study of

the number ofaabsences table. and the final grade of the seven students in the statistics class Step 1 Make given in Example 10–2.

Find the values of xy, x2, and y2; place these values in the corresponding columns of the table. Solution Number of Final grade a table. Step 1 Make y2 Student absences x y (%) xy x2 Step 2 Find the values of xy, x2, and y2; place these values in the corresponding A 492 36 6,724 columns6of the table. 82 B 2 86 172 4 7,396 Number of Final C 15 43 grade 645 225 1,849 2 Student absences x D 9 74y (%) 666 xy 81 x 5,476 Step 2

475

y2


E F

13.4 19.1 1.4 2.8 18.76 53.48 179.56 364.81 1.96 D 7.84 8.5 1.5 12.75 18.76 72.25 179.56 2.25 E 13.4 1.4 1.96 F �x � 153.8 8.5 2.25 �y2 � 80.67 �y � 18.7 1.5 �xy � 682.77 12.75 �x2 � 5859.26 72.25 �xin�the 153.8 �solve 18.7 for�xy Step 3 Substitute formula�y and r. � 682.77 �x2 � 5859.26 �y2 � 80.67

 統計學

� � formula � �x �� �y � and solve for r. n��xy Step 3 Substitute in the r� 2 2 � ][n � ��� �y � �y�� 2] 2[n��x � � ��x �y�2�x � �� n� �xy

r�

例題 10-5

� 6 ��2682.77 �� � 153.8 � �2 � � � �x � 2][n � �y2�� � 18.7 2[n��x � � �y ]

� 0.982 2 � 2][ � 18.7 2[�6 ��5859.26 � � 6���80.67 � 6 ��� 153.8 � �� � � ] 682.77 153.8���� 18.7 � 0.982 � The correlation coefficient suggests a strong relationship between the � � � 153.8 � 2][� 6 �� 80.67 � � � 18.7 � 2] 2[�6��5859.26 缺席與期末成績 number of cars a rental agency has and its annual revenue. �

The correlation coefficient suggests a strong relationship between the number of cars a rental agency has and its annual revenue. 計算例題 10-2 提供的缺席次數與統計學期末成績樣本數據的相關係數。 Example 10–5 ■解答

Absences and Final Grades Compute the value of the correlation coefficient for the data obtained in the study of Example 10–5 the number Absences and Final Grades of absences and the final grade of the seven students in the statistics class 步驟 1 given in Example 10–2. Compute the value of the correlation coefficient for the data obtained in the study of 製作一張表格。 the number of absences and the final grade of the seven students in the statistics class Solution given in Example 10–2. 步驟 2 a table. Step 21 Make x 和 y2 的數值,並且把結果放在表內適當的行內。 求出 xy, Solution Step 2 Find the values of xy, x2, and y2; place these values in the corresponding 完成的表格如下所示。 of athe table. table. Step columns 1 Make Number of Final Step 2 Find the values of xy, grade x2, and y2; place these values in the corresponding x y (%) Student absences x y (%) xy x2 y2 學生 缺席次數 期末成績 columns of the table. A BStudent C D A E B F C G D

步驟 3

Number of 6 2 absences x

15 9 12 5 8

Final grade 82 86 y (%) 43 74 82 58 86 90 43 78 74

492 172 645 666 696 450 624

6 2 15 9 E 12 �x � 57 �y � 511 58 �xy � 3745 F 5 90 Step 3 Substitute in the formula and solve for r. G 8 78

xy

36 4 225 81 144 25 64

492 172 645 666 696 �x2 � 579 450 624

x2

6,724 7,396 1,849 5,476 3,364 8,100 6,084

36 4 225 81 144 �y2 � 38,993 25 64

n��xy� � ��x���y� �x � 57 �y � 511 �xy � 3745 �x2 � 579 � 代入公式解得r r。 2[n��x2 � � ��x� 2][n��y2� � � �y� 2] Step 3 Substitute in the formula and solve for r. � 7 �� 3745 � � � 57 �� 511 � � �0.944 � n� �xy� � ��x���y� r2[ ��7��579� � �57� 2][�7��38,993� � �511� 2] 2[n��x2 � � ��x� 2][n��y2� � � �y� 2] � 7 �� 3745 � � � 57 �� 511 � � �0.944 � 2[�7��579 � � �57� 2][�7 ��38,993 � � �511 � 2]

y2 6,724 7,396 1,849 5,476 3,364 8,100 6,084

�y2 � 38,993

10–9

相關係數推測缺席次數與統計學期末成績之間有一種強烈的負關係。也就是說,缺席次數愈 10–9 多的學生,期末成績愈低。

例題 10-6

年紀與財產

計算例題 10-3 美國有錢人年紀和財產數據的相關係數。 ■解答 步驟 1 製作一張表格。 步驟 2 求出 xy, x2 和 y2 的數值,並且把結果放在表內適當的行內。

476


absences a student has, the lower is his or her grade. Age and Wealth Compute the value of the correlation coefficient for the data given in Example 10–3 and Wealth Example 10–6 for the Age age and wealth of the richest persons in the United States. Compute the value of the correlation coefficient for the data given in Example 10–3 Solution for the age and wealth of the richest persons in the United States. 相關與迴歸 Step 1 Make a table.

Example 10–6

10

Solution Find the values of xy, x2, and y2, and place these values in the corresponding table. Step 1 Make columns of thea table.

Step 2

x theNet 人 Step 2年紀 y xy, 財產 Person Age wealth y x2, andxy x2 values in the corresponding y2 Find values of y2, and place these

columns of the table.

A 73 B Person 65 C 53 A D 54 B E 79 C F 69 D G 61 E H 65

步驟 3

16

1,168

50

2,650

Age x 26 Net wealth1,690 y

73 21.5 16 1,161 65 40 26 3,160 53 16 50 1,104 54 19.6 21.5 1,195.6 79 19 40 1,235 F 69 16 �x � 519 �y � 208.1 �xy � 13,363.6 G 61 19.6 Step 3 Substitute in the solve for r. H 65formula and19

xy

5,329 4,225 2,809 2,916 6,241 4,761 3,721 4,225

1,168 1,690 2,650 1,161 3,160 1,104 �x2 � 34,227 1,195.6 1,235

256 676 2,500 5,329 462.25 4,2251,600 2,809 256 2,916 384.16 6,241 361

x2

4,761 3,721 4,225

�y2 � 6,495.41

y2 256 676 2,500 462.25 1,600 256 384.16 361

2 2 n� �xy� � ��x���y� � �x � 519 �y � 208.1 �xy � 13,363.6 �x � 34,227 �y � 6,495.41 代入公式解得r r。 2 2 2 2 � �x � � � �x � ][n� �y � � � �y � ] Step 32[n Substitute in the formula and solve for r. 8�13,363.6 � � �519��208.1� n� �xy� � ��x���y� � r� � 34,227 � � � 519 � 2][8� 6495.41 � � � 208.1 � 2] 2[8 2 2[n��x � � ��x� 2][n��y2� � ��y� 2] �1095.1 � 8�13,363.6 � � �519 ��208.1 � 2�4455 � ��8657.67� 2 2 �1095.12[8� 34,227 � � � 519 � ][8� 6495.41 � � � 208.1 � ] � �1095.1 6210.469 � � �0.176 2� 4455 �� 8657.67 � The value of�1095.1 r indicates a very weak negative relationship between the variables. � 6210.469 � �0.176 In Example 10–4, the value of r was high (close to 1.00); in Example 10–6, the value of r was much lower to r0). This question then arises, When is the value of r due The (close value of indicates a very weak negative relationship between the variables. 相關係數推測兩變數間有一種非常微弱的負線性關係。 to chance, and when does it suggest a significant linear relationship between the variables? This question will be answered next. Objective 3 In Example 10–4, the value of r was high (close to 1.00); in Example 10–6, the value The Significance the (close Correlation Coefficient Asarises, stated before, range of r was muchoflower to 0). This question then When isthethe value of r due Test the hypothesis of the correlation coefficient is between �1 and �1. When the value of r is near �1 or the varito chance, and when does it suggest a significant linear relationship between H0: r � 0. �1, there is a strong linear relationship. When the value of r is near 0, the linear rela在例題 10-4,r 的數值很高(接近 ables? This question will be1.00);在例題 answered next. 10-6,r 的數值很低 tionship is weak or nonexistent. Since the value of r is computed from data obtained from Objective 3 samples, there are two possibilities when r is not equal to zero: either As the stated value of r is the range r of (接近 0)。接著你可能會問,當 值和機會有關,什麼時候會推測變數間某 The Significance the Correlation Coefficient before, Test the hypothesishigh enough conclude that there isisa between significant�1 linear between of the to correlation coefficient and relationship �1. When the valuethe of varir is near �1 or H0: r 種顯著的線性關係?我們接下來回答這個問題。 � 0. ables, or thethere valueisofarstrong is due to chance. �1, linear relationship. When the value of r is near 0, the linear relationship is weak or nonexistent. Since the value of r is computed from data obtained from 10–10 學習目標 r 值 the 相關係數的顯著性 如前所述,相關係數的範圍落在 1 之間。當 samples, there are two possibilities when−r 1is 和 not+equal to zero: either value of毧r is high enough to conclude that there is a significant linear relationship between the vari接近 + 1,或是 − 1,表示有一種強烈的線性關係。當 r 值接近 0,線性關係是 檢定這項假設 ables, or the value of r is due to chance. H : ρ = 0。

微弱的,或是不存在的。因為 r 值是用樣本數據計算,當 r 不等於 0 的時候,

0

10–10

有兩種可能性:若不是因為 r 值夠高,讓我們可以結論為變數間有顯著的線性 關係;就是因為某種機會才讓我們看到現在的 r 值。 為了作出決策,你會使用某一種假設檢定。傳統法和前面章節用過的類 似。

步驟 1 陳述假設。

477


lu38582_ch10_533-590.qxd

9/13/10

2:17 PM

Page 543

 統計學 Section 10–1 Scatter Plots and Correlation

步驟 2 求出臨界值。

步驟 3 計算檢定數值。 To make this decision, you use a hypothesis-testing procedure. The tra method is similar to the one used in previous chapters. 步驟 4 下決定。 Step 1 State the hypotheses. 步驟 5 摘要結論。 Step 2 Find the critical values. Step 3 Compute the test value. 透過所有可能的成對數據 (x, y) 計算母體相關係數;用希臘字母 ρ 代表。 Step 4 Make the decision. 如果以下的假設是對的,則可以用樣本相關係數估計母體相關係數 ρ 。 Step 5 Summarize the results.

1. 變數 x 和 y 是線性相關的。 The population correlation coefficient is computed from taking all possib pairs; it is designated by the Greek letter r (rho). The sample correlation coeffici 2. 變數都是隨機變數。 then be used as an estimator of r if the following assumptions are valid. 3. 兩變數是雙變量常態分配。 1. The variables x and y are linearly related. 2. The variables are randomxvariables. 一種雙變量常態分配意味著,針對任意已知的 值,對應 y 值會是某種鐘 3. The two variables have a bivariate normal distribution. 形分配,而針對任意已知的 y 值,對應 x 值也會是某種鐘形分配。 A biviarate normal distribution means that for the pairs of (x, y) data values, responding y values have a bell-shaped distribution for any given x value, and th ues for any(population given y value have a bell-shaped distribution. ρ 就是 根據正式定義,母體相關係數 correlation coefficient) 用所有母體內可能的成對數據 (x, y) 計算出來的相關係數。

Formally defined, the population correlation coefficient r is the correlation com by using all possible pairs of data values (x, y) taken from a population.

趣 聞 科學家認為人在任 何時候都不會跟一 隻蜘蛛距離 3 英呎 以上。

在假設檢定的時候,以下有一個假設是真的:

In hypothesis testing, one of these is true: Fact Interesting H :ρ = 0 這一項虛無假設意味著變數 x 和 y 無相關。 0

Scientists think that a This null hypothesis means that there is no correlation betwee H0: r � 0 ρ ≠0 這一項對立假設意味著變數 x和 y 有顯著的相關。 personHis1: never more x and y variables in the population. than 3 feet away from This alternative hypothesis means that there is a significant co H1: r � 0 r 值和 in 當虛無假設在某一個顯著水準被拒絕的時候,它意味著 0 之間有顯 a spider at any given tion between the variables the population. time! r

著的差距。當虛無假設不被拒絕的時候,它意味著 值和 0 之間沒有顯著的差 When the null hypothesis is rejected at a specific level, it means that the r 值。 距,而且可能是因為機會才看到現在的 significant difference between the value of r and 0. When the null hypothesis rejected, it means that the value of r is not significantly different from 0 (zero) 有許多方法可以用來檢定相關係數的顯著性,這一節會介紹三種方法。第 probably due to chance. 一種方法是使用 t 檢定。 Several methods can be used to test the significance of the correlation coe Three methods will be shown in this section. The first uses the t test.

t 檢定的公式 istorical Notes H相關係數

Formula for the t Test for the Correlation Coefficient

A mathematician named Karl Pearson n�2 t�r (1857–1936) became 1 � r2 interested in Francis − 2。 with degrees of freedom equal to n � 2. 其中自由度是 Galton’s work andnsaw that the correlation and regression theory Although hypothesis tests can be one-tailed, most hypotheses involving the could be applied to tion coefficient are two-tailed. Recall that r represents the population correlation other areas besides cient. Also, if there is no linear relationship, the value of the correlation coeffici heredity. Pearson be 0. Hence, the hypotheses will be developed the correlation coefficient and H1: r � 0 H0: r � 0 that bears his name.

478


Several Several methods methods can can be used be used to test to test the significance the significance of the of correlation the correlation coefficient. coefficient. Three Three methods methods will will be shown be shown in this in this section. section. The The first first usesuses the tthe test. t test.

istorical istorical Notes Notes HH

Formula Formula for the for the t Test t Test for the for the Correlation Correlation Coefficient Coefficient

A mathematician A mathematician named named Karl Pearson Karl Pearson n � 2n � 2 t � rt � r (1857–1936) (1857–1936) became became 1 � r12� r 2 相關與迴歸 interested interested in Francis in Francis with with degrees degrees of freedom of freedom equalequal to n � to 2. n � 2. Galton’s Galton’s workwork and saw and saw that the thatcorrelation the correlation and regression and regression theory theory 雖然假設檢定可以是單尾的,大部分相關係數的假設檢定都是雙尾的。回 Although Although hypothesis hypothesis teststests can can be one-tailed, be one-tailed, mostmost hypotheses hypotheses involving involving the correlathe correlacouldcould be applied be applied to to tion tion coefficient coefficient are two-tailed. are two-tailed. Recall Recall that that r represents r represents the population the population correlation correlation coefficoeffiρ 憶一下, 表示母體相關係數。同時,如果沒有線性關係,相關係數的值會是 otherother areasareas besides besides cient. cient. Also, Also, if there if there is no is linear no linear relationship, relationship, the value the value of the of correlation the correlation coefficient coefficient will will heredity. heredity. Pearson Pearson 0,因此,假設會是 be 0. be Hence, 0. Hence, the hypotheses the hypotheses will will be be developed developed the the correlation correlation coefficient coefficient 0 � 0 and and H1: rH� 0� 0 H0: rH� 以及 0: r 1: r that bears that bears his name. his name.

��

10

You You do not do have not have to identify to identify the claim the claim here,here, sincesince the question the question will will always always be whether be whether 在這裡你不需要指出主張,因為問題總是這樣:變數之間是否存在某種顯 there there is a is significant a significant linear linear relationship relationship between between the variables. the variables. lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 544 著的線性關係? 10–11 10–11

我們會用雙尾的臨界值。在附錄 C 的表 E 可以發現這些數字,同時,當 你檢定相關係數的顯著性,兩變數 x 和 y 必須來自常態分配母體。 544

Chapter 10 Correlation and Regression

例題 10-7 The two-tailed critical values are used. These values are found in Table F in α = 0.05 r = 0.982。 檢定在例題 10-4 求出之相關係數的顯著性。使用 Appendix C. Also, when you are testing the和 significance of a correlation coefficient, both variables x and y must come from normally distributed populations. ■解答 步驟 1 10–7 Example

Test the significance of the correlation coefficient found in Example 10–4. Use a � 0.05 陳述假設。 and r � 0.982. Solution

步驟 2

Step 1

H0:ρ = 0 以及 H1:ρ ≠0

State the hypotheses.

and H:r�0 H0: r � 0 求出臨界值。因為 α = 0.05,而且有 6 − 2 = 4 1的自由度,從表 E 求出臨界值是±2.776,如圖 10-7 所示。 Step 2 Find the critical values. Since a � 0.05 and there are 6 � 2 � 4 degrees of freedom, the critical values obtained from Table F are �2.776, as shown in Figure 10–7. 圖 10-7

例題 10-7 的臨 界值

Figure 10–7 Critical Values for Example 10–7

–2.776

0

+2.776

–2.776

步驟 3 計算檢定數值。 Step 3

0

+2.776

Compute the test value.

t�r

n�2 � 0.982 1 � r2

6�2 � 10.4 1 � �0.982� 2

Make the decision. Reject the null hypothesis, since the test value falls in the critical region, as shown in Figure 10–8. 下決定。拒絕虛無假設,因為檢定數值落在拒絕域,如圖 10-8 所示。

步驟 4

Step 4

Figure 10–8 Test Value for Example 10–7

479

–2.776

0

+2.776 +10.4


 統計學

圖 10-8 例題 10-7 的檢 定數值

–2.776

0

+2.776 +10.4

步驟 5 摘要結論。在出租車輛數與公司的收益之間有一種顯著的關係。

第二種可以用來檢定 r 的顯著性的方法是 p 值法。這一個方法和在第 8 章 以及第 9 章的內容一樣。它使用以下的步驟。

步驟 1 陳述假設。 步驟 2 計算檢定數值。(這時候使用 t 檢定。) 步驟 3 求出 p 值。(這時候使用表 E。) 步驟 4 下決定。 步驟 5 摘要結論。 考慮一個例子,其中 t = 4.059 和 d.f. = 4。使用表 E 加上 d.f. = 4,在雙尾 那一列,發現數字 4.059 落入 3.747 和 4.604 之間;因此,0.01 < p 值 < 0.02。 (從計算機得到的 p 值是 0.015。)也就是說,p 值落在 0.01 和 0.02 之間。然 後,我們決定拒絕虛無假設,因為 p 值 < 0.05。 第三個方法是用附錄 C 的表 H 檢定 r 的顯著性。針對特定的水準 α 和某 個自由度,這一張表格顯示什麼樣的相關係數是顯著的。比如說,針對自由度 7 和 α = 0.05,表格給我們的臨界值是 0.666。任何超過 + 0.666 或是 − 0.666 的 r 都會被認為是顯著的,而且虛無假設會被拒絕。詳見圖 10-9。當使用表 H 的 時候,你不需要計算 t 檢定數值。另外,表 H 只適用於雙尾檢定。

480


相關與迴歸

a = 0.05

d.f.

10

圖 10-9

a = 0.01

從表 H 求出臨 界值

1 2 3 4 5 6 0.666

7

例題 10-8 針對例題 10-6 找到的相關係數 r = − 0.176,在 α = 0.01 之下利用表 H 檢定顯著性。 ■解答 H0:ρ = 0 以及 H1:ρ ≠0 因為樣本數是 8,有 n − 2 或說是 8 − 2 = 6 個自由度。當 α = 0.01 且 d.f. = 6 的時候,從表 H 得到的臨界值是 0.834。如果是顯著的線性關係,r 值要超過 + 0.834 或低於 − 0.834。因為 r = − 0.176,它超過 − 0.834,所以虛無假設不會被拒絕。因此,沒有足夠的證據支持年紀和財產 之間有某種顯著的線性關係。 拒絕 Reject lu38582_ch10_533-590.qxd

9/13/10

–1 546

Do不拒絕 not reject 2:17 PM

拒絕 Reject

圖 10-10

Page 546

–0.834

–0.176

0

+0.834

+1

例題 10-8 的拒 絕域和非拒絕域

Chapter 10 Correlation and Regression

© Getty RF.

Correlation and Causation Researchers must understand the nature of the linear relationship between the independent variable x and the dependent variable y. When a hypothesis test indicates that a significant linear exists the variables, x relationship y between 相關和因果 研究員必須了解獨立變數 與依變數 之間線性關係的本質,當 researchers must consider the possibilities outlined next.

假設檢定指出變數間存在某種顯著的關係,研究員必須考慮以下內容的可能 Possible Relationships Between Variables When the null hypothesis has been rejected for a specific a value, any of the following five possibilities can exist. 1. There is a direct cause-and-effect relationship between the variables. That is, x causes y. For example, water causes plants to grow, poison causes death, and heat causes ice to melt. 2. There is a reverse cause-and-effect relationship between the variables. That is, y causes x. For example, suppose a researcher believes excessive coffee consumption causes nervousness, but the researcher fails to consider that the reverse situation may occur. That is, it may be that an extremely nervous person craves coffee to calm his or her nerves. 3. The relationship between the variables may be caused by a third variable. For example, if a statistician correlated the number of deaths due to drowning and the number of cans of soft drink consumed daily during the summer, he or she would probably find a significant relationship. However, the soft drink is not necessarily responsible for the deaths, since

481


 統計學

變數間可能的關係 當虛無假設在某一個 α 值被拒絕的時候,會存在以下五種可能性: 1. 變數間有一種直接的因果關係。也就是說,x 引起 y。比如說,有水植物才會 長大,中毒導致身亡,或是熱讓冰熔化。 2. 變數間有一種逆向的因果關係。也就是說,y 引起 x。比如說,假設某一位 研究員相信喝太多咖啡會造成緊張,但是研究員卻沒有想到可能是相反的情 況。也就是說,極度緊張的人想要喝咖啡減輕緊張的程度。 3. 變數間的關係可能是因為同時受到第三個變數的影響。比如說,如果有一位 統計學家把死亡人數和溺死人數以及暑假每天喝幾罐汽水相關起來,他可能 會發現某種顯著的關係。不過,汽水並不會造成死亡,因為兩個變數可能都 和高溫以及溼度有關。 4. 許多變數之間有各種複雜關係。比如說,有一位研究員可能發現學生的大學 成績和高中成績有顯著的關係。但是可能也與其他變數有關,諸如智商、念 書時數、父母的影響、動機、年紀以及老師。 5. 有關係可能是因為巧合。比如說,某一位研究員可能在運動人數和犯罪人數 之間發現一種顯著關係。但是一般知識指出任何這兩種數字之間的關係一定 是因為巧合。

性。 當兩變數高度相關的時候,上述的第 3 點提出一種可能性,就是兩者相關 因為第三個變數。如果是這樣,而且研究員不知道是哪一個變數或是該變數未 被包含在研究內,則它叫做潛伏變數 (lurking variable)。研究員會試圖找到這 樣的變數,並且使用方法控制它們的影響。 再一次強調一項重點,如果兩變數的相關係數很高,不代表具有因果關 係。也有其他可能,諸如潛伏變數或是巧合。 同時,你應該注意一個或兩個變數涉及平均數而不是個別數據。用平均數 不是錯誤,但是分析結果卻無法一般化到個體,因為平均數會淡化個別數據間 的變異。這可能會帶出比實際情形高的相關結果。 因此,當拒絕虛無假設的時候,研究員必須考慮所有可能性,並且透過研 究結果決定其中的一個。記住,相關不必然帶出因果。

482


It is important to restate the fact that even if the correlation between two variables is high, it does not necessarily mean causation. There are other possibilities, such as lurking variables or just a coincidental relationship. See the Speaking of Statistics article on page 548. Also, you should be cautious when the data for one or both of the variables involve averages rather than individual data. It is not wrong to use averages, but the results cannot be generalized to individuals since averaging tends to smooth out the variability among 相關與迴歸 individual data values. The result could be a higher correlation than actually exists. Thus, when the null hypothesis is rejected, the researcher must consider all possibilities and select the appropriate one as determined by the study. Remember, correlation does not necessarily imply causation.

10

觀念應用 10-1 煞車距離

Applying the Concepts 10–1

Stopping Distances 在一項速度控制的研究,發現制定交通規則的最主要理由其實是為了車流 In a study on speed control, it was found that the main reasons for regulations were to make

效率和降低發生危險的風險。有一個領域曾經是研究的重點,就是各種速度下 traffic flow more efficient and to minimize the risk of danger. An area that was focused on in the study was the distance required to completely stop a vehicle at various speeds. Use the

的煞車距離。使用以下的數據回答問題。 following table to answer the questions. MPH MPH

煞車距離(英呎) Braking distance (feet)

20 30 40 50 60 80

20 45 81 133 205 411

Assume MPH is going to be used to predict stopping distance. 假設 MPH 會被用來預測煞車距離。 1. Which of the two variables is the independent variable?

1. 上述兩個變數中,哪一個是獨立變數?

2. Which is the dependent variable?

2. 哪一個是依變數? 3. What type of variable is the independent variable? 4. What type of variable is the dependent variable? 3. 獨立變數是哪一種變數? 5. Construct a scatter plot for the data. 4. 依變數是哪一種變數?

6. Is there a linear relationship between the two variables?

5. 為數據建構一張散佈圖。

7. Redraw the scatter plot, and change the distances between the independent-variable

numbers. Does the relationship look different? 6. 兩變數之間有某種線性關係嗎? 8. Is the relationship positive or negative?

7. 改變獨立變數的數字間的距離,再畫一張散佈圖。此關係看起來有不一樣 嗎?

9. Can braking distance be accurately predicted from MPH?

10. List some other variables that affect braking distance.

11. Compute the value of r. 8. 關係是正的還是負的? 12. Is r significant at a � 0.05? 9. 可以用 MPH 準確預測煞車距離嗎? See page 589 for the answers.

10. 舉出數個影響煞車距離的變數。 11. 計算相關係數 r。

12. 在 α = 0.05 之下,相關係數 r 顯著嗎?

10–15

答案在第 509 頁。

練習題 10-1 1. 兩變數有關係的主張是什麼意思? 2. 樣本相關係數的符號是哪一個?母體相關係 數呢? 3. 兩變數之間有正關係的意思是什麼?負關係 呢?

4. 舉出一個相關研究的例子,並且指出獨立變 數和依變數。 5. 本節使用的相關係數名稱是什麼? 6. 當兩變數是相關的,研究員可以確定是哪一

483


Number of fires x 72 2:17 69 58 84 550 62 57 45 38582_ch10_533-590.qxd 9/13/10 PM 47 Page

(The information in this exercise will be used for significant relatio Faculty 99 and110 Exercises 18 38 in 113 Section116 10–2.)138 174 220 Number of Fat calories 1353 1290 1091 1213 1384 1283 2075 acres burned y 62 42 19 26 51 15 30 15 Students Section 10–1 Scatter Plots and Correlation 549 19. Chapter Egg Production Recent agricultural data 550 550 Chapter 10 10 Correlation Correlation and and Regression Regression Sat. fat (g) Source: World Almanac. Source: National Interagency Fire Center. showed the number of eggs produced and the Source: www.fatcalorie priceinformation received per for a given year. (The in dozen this exercise will be usedBased for on (The information in this exercise will be used for c. State10the hypotheses. Years x 1 5 3 10 7 6 550 Chapter Correlation and Regression the following data for a random selection of states, Exercises 21 and 36Calls inCalls Section 10–2.) 20.20. Emergency Emergency and and Temperature Temperature AnAn 24.24. NHL NHL Assi A Exercise 14 in Section 10–2 and Exercises 16 and 20 in (The information can itContribution be concluded that a relationship exists d. 10–3.) Test the significance of the correlation coefficient at y 500 100 300 50 75 80 emergency emergency service service wishes wishes to see to see whether whether a relationa relationsample sample of scori of sc Section a統計學 Exercise 25 in Se between the number eggs produced and and theand price � 0.05, using Table I. 22. Precipitation and Snow/Sleet For a random ship ship exists exists between between theofthe outside outside temperature temperature thethe following following numb num 20. Calls and Temperature An 24. NHL Assists and Total Points ATax random per dozen? (The information in this exercise be 16. State Debt and Per Capita An economics selection of emergency U.S. cities, the following data show the 15. e.Emergency Alumni Theofdirector Give Contributions a brief explanation the typeof ofanrelationship. number number of of emergency calls calls it receives it receives for for a 7-hour awill 7-hour on on these these data, data ca 26.significant Tall Buildi emergency service wishes to see whether a relationsample of scoring leaders from the NHL showed used for Exercise 19 in Section 10–2.) student wishes to see if there is a relationship between number of days for which the precipitation is greater alumni association for a small college wants to period. period. TheThe data data areare shown. shown. (The (The information information in this in the this significant relati re the relationship ship exists between the outside temperature theindicate following numbers of assists and total points. amount ofused state debt pernumber capita and the amount of than orthe equal to inch andExercises the of38days for 12.個引起哪一個嗎? Gas Tax there and Fuel The data and below determine whether is anyUse type of relationship will be0.01 be used for for Exercises 20 20 and and 38Based xwill 蛋的產量 No.exercise ofexercise eggs building and the number emergency it receives for and a 7-hour on these data, itatbe concluded thereon isthe a Isfollowing Assists Assists tax per state Based there iscapita atcan least 1the inch of level. snowthat and/or sleet. theof state gas tax inancents per gallon the fuel use per which between the amount ofcalls alumnus’s contribution in Section in Section 10–2.) 10–2.) (millions) x relationship 957 1332 1163 1865 119 273 (百萬) The data for a sa period. The data are shown. (The information in this significant between the two? can shelinear or herelationship conclude that per capita 針對練習題 到 (in 14,執行以下的步驟。 there adata, significant between the state debt registered gallons). Is has therebeen a significant (in dollars) and vehicle the7years the alumnus Total Total points points are shown. Expla exercise will be used for Exercises 20 and 38 y 每打價格 Price per dozen Temperature Temperature x x 68 68 74 74 82 82 88 88 93 93 99 99 101 101 and per capita state taxes are related? Both amounts are variables? these(The two variables? out ofrelationship school. Thebetween data follow. information is used Source: Source: Associated Associate Pr Assists 26 0.697 29 0.617 32 34 361.080 37 1.420 40 a. 為變數繪製一張散佈圖。 in Section 10–2.) (美元) (dollars) y 0.770 0.652 in dollars and represent five randomly selected states. for Exercises 15, 36, and 37 in Section 10–2 and No.No. of calls of calls y y 7 7 4 4 8 8 10 1011 11 9 9 13 13 Stories x 64 54 Tax 21.5 23 18 24.5 26.4 19 Precipitation � Total points 66 69will 76 67 for84 this exercise be used Exercisesb.17計算相關係數。 and 21 in68Section 10–3.) Source:(The Worldinformation Almanac. 48 in68 lu38582_ch10_533-590.qxd 8582_ch10_533-590.qxd 9/13/10 2:17 PM 550550 (The (The informatio informa Temperature x 9/13/10 742:17 82 PMPage 88 Page 93 99 101 inch 61 14010–2 116 88 136 18 Height 11. 教職員和學生 一組小型大學的隨機樣本顯 y 841 725 Usage 1062 631 920 686 736 684 0.01 Exercises 16and and 37 in111 Section and Exercises Source: Associated Press. 21. 21. Faculty Faculty and Students Students The The number number of of faculty faculty Exercise Exercise 24 24 in Si c. 陳述假設。 No. of calls y 7 4 8 10 11 9 13 and 22�number in Section 10–3.)areare Source: World Almanac Boo Snow/sleet 示了教職員人數與學生數。兩變數間有顯著 andand thethe number of of students students shown shown forfor a random a random (The information in this exercise will be used for (The information in this exercise will be used d. 使用附錄 C 表 H 在 α = 0.05 之下檢定相 1selection inselection 15 21907 81445for 11 10–17 13 661 of of small small colleges. colleges. Isy there Is there a significant a significant 25.25. FatFat Gram Gra Section 10–2.) Per 24 capita debt x210–2.) 1608 x 和1924 的關係嗎?對調 的角色再做一次。你 21. Exercise Faculty 12 andinStudents The number of faculty Exercise in Section (The information relationship relationship between between thethe twotwo variables? variables? Switch Switch x x 關係數的顯著性。 numbers numbers of of fat Source: World Almanac. Per capita tax y 1685 1838 1734 1842 1317 and Source: the number of students are shown for a random World Almanac. Exercise 26 of Se andand y認為哪一個是真正的獨立變數? and y and repeat repeat thethe process. process. Which Which do do youyou think think is is a number a number of of fast f e. 簡單解釋該關係的種類。 selection of 10 small there a significant 550550 Chapter Chapter Correlation 10 colleges. Correlation andIs and Regression Regression (The information inand this exercise will be usedThe for 25. Fat Grams Secondary Schools Source: World Almanac. really really thethe independent independent variable? variable? Almanac. 資料來源:World below. below. Is there Is ther s Section 10–1 Scatter Plots and Correlation 549 relationship between theMovie two variables? x 13.商業電影 年度發表的數據顯示歷年來每一 Commercial Releases Switch The yearly Exercise 22of in fat Section 10–2.) numbers calories and grams of saturated fat in 27.significant Hospital B 7. significant relati rel and y and the process. Which do you is of datarepeat have been published showing the think number 教職員人數 Faculty Faculty 99fast-food 99 110 110nonbreakfast 113113 116entrees 138138are 174shown 174 220 17.x of School Districts and116 Secondary Schools A220 a number to see if there is 家電影院的上映次數與它的總收入。根據數 really the independent variable? releases for Calls each of the commercial movie 20. 20. Emergency Emergency Calls and and Temperature Temperature An Anstudios 24. y24. NHL Assists Assists andand Total Total Points Points A the random A following random 23. Average Temperature and Precipitation random sample of states yielded below. IsNHL there sufficient evidence to conclude a12832075 of licensed beds Fat Fat calories calories 學生數 Students 1353 1353 1290 1290 1091 1091 1213 1213 1384 1384 1283 2075 Years xand theservice 1 wishes 5see 3 whether 10 7relation6Based Students gross receipts for those studios thus far. emergency emergency service wishes to to see whether a relationa sample sample of scoring of scoring leaders leaders from from the the NHL NHL showed the the The average normal daily temperature (in degrees numbers of local school districts and theshowed corresponding significant relationship between the two variables? 據,可以認為上映次數與總收入之間有某種 local hospitals. Sat. Sat. fatfat (g)(g) T Faculty 99 116300 138 174 220 Source: Source: World World Almanac. Almanac. t at onexists these data,110 can it113 be concluded that there a Contribution ybetween 500 100 50and 75 80 shipship exists between the the outside outside temperature temperature and theisthe following following numbers numbers of assists of assists andand total total points. Based Fahrenheit) and corresponding average monthly numbers ofthe secondary schools. Ispoints. there aBased significant Describe the rela 12. 平均溫度與降雨量 隨機挑選七個美國城 Fat 220 270 360 460 Source: Source: www.fatcalor www.fatc relationship between thereceives offor and the number number of關係嗎? emergency of emergency calls calls it itnumber receives for areleases 7-hour a 7-hour on these oncalories these data, data, caninches) can it190 be be concluded that that there there is for aisfor a 540 precipitation (in for thedata? month of June are Students 1353 1290 1091 1213 1384 1283 2075 relationship between the (The (The information information in this init concluded this exercise exercise will will be be used used 市,它們的 6 月平均每日常溫(以華氏計) 16. period. State Debt and Pershown. Capita Tax An economics Licensed beds x 資料來源:www.showbizdata.com ip. gross receipts? period. The The data data are are shown. (The (The information information in this in this significant significant relationship relationship between between the the two? two? shown here for seven randomly selected cities in the Sat. fat (g) 9 8 13 17 23 27 Exercises Exercises 21 21 and and 36 36 in Section in Section 10–2.) 10–2.) Source: World Almanac. School 53 is19 24 17 95 68 (The (The informatio informa student wishes if there is a relationship exercise exercise willwill betoused besee used for for Exercises Exercises 20 and 20 and 38 between 38 United States.districts Determine if there a relationship Staffed beds y 和平均每月降雨量(以英吋計)如下所示。 Source: www.fatcalories.com x No. of releases x 361 270 306 22 35 10 8 12 21 上映次數 Assists Assists 26 26 29 29 32 32 34 34 36 in 3684 37 37 40 40216 Exercise Exercise 25 25 in Sin the amount of state debt per capita and the amount of e information in this exercise will be used for in(The Section in Section 10–2.) 10–2.) Secondary schools 50 27 187 143 between the two variables. (The information 22.22. Precipitation Precipitation and and Snow/Sleet Snow/Sleet For For a random a random Source: Pittsburgh Trib 決定這兩項變數之間是不是有某一種關係。 taxExercises per總收入 capita the state Based on the following per 21atyand in level. Section 10–2.) Gross receipts y 36 this exercise will be used for Exercise 23 in Total Total points points 48 48the 68 68 66 66 69data 69data 76 76 67 selection selection ofWorld U.S. of U.S. cities, cities, the following following show show the67 the84 84 Source: Almanac. (The information in this exercise will be used in Temperature Temperature 68 68 74 82 88 88 93 99 99101 101125 Section data, can she that74 per 82 capita state93 debt York Times Almanac. 資料來源:New (million $)orxhe xconclude 3844 1962 1371 1064 334 241 188 154 (百萬美元) 26.26. Tall Tall Build Bu 10–2.) number number of ofindays for for which which thethe precipitation precipitation is greater is greater (The information Source: Source: Associated Associated Press. Press. Exercise 25days Section 10–2.) per capita state taxes are related? amounts are 22.and Precipitation and Snow/Sleet For8Both a random (The information in this exercise will be used for the the relationship relations No. No. of(The calls of calls y y 7 7 4 4 8 10 10 11 11 9 9 13 13 than than or equal or equal to x0.01 tox 0.01 inch inch and and the83 the number number of days for Exercise 28 of th inthe this exercise will show bestates. used Avg. daily temp. 86 81 89 of 80days 74for64 平均每日常溫 in selection dollars8.and represent five randomly selected ofinformation U.S. cities, following data thefor Exercise 17 ofthis Section 10–2.) building building andand the 校友捐款 一所小型大學的校友會會長希望 (The (The information information in in1this exercise exercise will will be used be used forsleet. for which which there there is at is least at least inch 1 inch of snow of snow and/or and/or sleet. Is Is Section 10–2.) 19 Exercises 13 and 36 in Section 10–2 and Exercises 15 26. Tall Buildings An architect wants to determine (The in this exercise will benumber used number of days for which theThe precipitation isfor greater TheThe data data forfor a saa y linear 平均每月降雨量 Avg. mo. precip. ySection 3.4relationship 1.8 3.5 between 3.6between 3.7 the 1.5 0.2 21.information 21. Faculty Faculty and and Students Students The number of faculty of faculty Exercise Exercise 24 in 24 Section in 10–2.) 10–2.) there there a significant a significant linear relationship the andequal 19and intoSection 10–3.) 知道校友捐款(以美元計)和畢業年數之間 the relationship the heights (in feet) a show 18. Triplesbetween and Home Runs The data of below 684 Exercises 16 37 Section 10–2 and Exercises 18 than or 0.01 inch and the number of days for 549 areare shown. shown. Expl Ex andand the the number number ofinstudents of students are are shown shown for for a random a random Section 10–1 Scatter Plots and Correlation variables? variables? Source: New York Times Almanac. of stories in the building. building and the number the number of three-base hits (triples) and the number Source: www.showbizdata.com and 22 inthere Section 10–3.) which is at least 1 inchIsof and/or sleet. Is 是不是有某一種關係?數據如下所示。 13. 脂肪 某些速食的脂肪熱量(以卡路里計) selection selection of small of small colleges. colleges. there Issnow there a significant a significant 25. 25. Fat Fat Grams andand Secondary Schools Schools TheThe The data forGrams aruns sample of Secondary 10 buildings in by Pittsburgh Stories x x 64 6454 of home hit during the season a random sample Stories there a significant linear relationship between the Precipitation Precipitation � �calories relationship relationship between between the the two two variables? variables? Switch Switch x 661 x numbers numbers of fat of calories fat and and grams grams of saturated of saturated fat in fat in Per capita debt x 1924 907 1445 1608 和飽和脂肪重量(以公克計)如下所示。有 are shown. Explain the relationship. 14. Forest Fires and Acres Burned An of MLB teams. Is there a significant relationship variables? 0.01 inch inch 61 nonbreakfast 61 111111 140 140 116 116are 88shown 88136136 Height Height y y 841841 7257 andand y and y and repeat thexthe process. process. Which do think is 7is 6 畢業年數 Years x repeat 1 Which 5 do you 3 you 10think a0.01 number a number of fast-food of fast-food nonbreakfast entrees entrees are shown environmentalist wants to determine the relationships Per capita taxindependent y 1685 1838 1734 1842 1317 between the data? 充分證據認為兩變數之間有顯著的關係嗎? really really the the independent variable? variable? ficient at below. below. Is there Is there sufficient sufficient evidence evidence to conclude to conclude a a Stories x 64 54 40 31 45 38 42 41 37 40 Source: Source: World World Almanac Almanac Boo y� numbers Snow/sleet Snow/sleet �� 捐款 the Contribution y 500 100 300of forest 50 fires 75 over 80 Precipitation between (in thousands) Source: World Almanac. 資料來源:www.fatcalories.com 25 51 19 significant relationship relationship the the two two variables? 1 in1significant 2 between 2 615 152315 21 21 8variables? 8 11 1120485 13 13 43 0.01 inch 61 111 140 116 88 220 136 Height yinTriples 841 725 635 between 616 582 535 520 511 the year the110 number (in thousands) of220 acres10–18 Faculty Faculty 99 and 99 110 113113 116hundred 116 138138 174174 (The (The informatio informa 16. Debt and Capita TaxareAn economics ionship. 9. State 學區與高中 一組隨機樣本產生以下的資 Home 212220220 199 144 160 149 Source: Source: World World Almanac. Almanac. burned. The data forPer 8 recent years shown. Describe Source: 17. School Districts and Secondary Schools A xruns World Almanac Book190 of Facts. Snow/sleet � 脂肪熱量 Fat Fat calories calories 190 270270 360360 460460 540540122 Exercise Students Students 1353 1353 1290 1290 1091 1091 1213 1213 1384 1384 1283 1283 2075 2075 Exercise 26 26 of S o student wishes to see if there is a relationship between thesample relationship. random of state statesdebt yielded the 21 following Source: New YorkinTimes Almanac. 1 in 2 15 8 11 13 訊,地方學區的個數和它有幾所高中。數據 (The (The information information this in this exercise exercise will will be be used used for for the amount of per capita and the amount of y 飽和脂肪重量 ndicate Sat. Sat. fatinformation fat (g) (g) 9 exercise 8 8 13will 13 be 17used 17 23 Source: Source: World Almanac. Almanac. ed (The in9 this for23 27 27 numbers ofWorld local school and Based the corresponding Exercise Exercise 22information 22 in Section in Section 10–2.) 10–2.) taxNumber per capita at thexdistricts state72 level. on the Source: World Almanac. el use per 27.27. Hospital Hospita B 之間有顯著的關係嗎? of fires 69 58 47 84following 62 57 45 Source: (The in this exercise will be used for Source: www.fatcalories.com www.fatcalories.com Exercise 26 of Section 10–2.) numbers of secondary schools. Is there a significant (The(The information information in this in this exercise exercise willwill be used be used forstate for debt data, can she or he conclude that per capita 14. 醫院病床 有一位醫院的主任希望知道區域 cant to see to see if there if ther i Exercises 18 and 38 in Section 10–2.) Almanac. 資料來源:World Number (The information inthe this will be used relationship between data? Exercises Exercises 21 and 21ofand 36state in 36 Section inexercise Section 10–2.) 10–2.) 23. 23. Average Average Temperature Temperature and and Precipitation Precipitation and per capita taxes are related? Both for amounts are of licensed of licensed beds be (The (The information information in this in this exercise exercise will will be used be used in in 醫院的許可床位數和工作人員床位數是不是 acres y 10–2.) 62 42 19 26 51 15 30 15 Exercise 22burned in Section 27.The Hospital Beds A hospital administrator wants The average average normal daily daily temperature temperature (in(in degrees degrees in districts dollars local local hospitals. hospitalT x represent 學區and School 53 five 19 randomly 24 17 selected 95 states. 68 19. Recent agricultural data Exercise Exercise 25 Egg in 25normal Section inProduction Section 10–2.) 10–2.) .4 19 22. 22. Precipitation Precipitation and and Snow/Sleet Snow/Sleet For For a random a random to see if there isthe a the relationship between themonthly number Source: National Interagency Center. will be used for 有關係。某一天的數據如下所示。描述兩變 Fahrenheit) Fahrenheit) andand corresponding corresponding average average monthly (The information in thisFire exercise Describe Describe thethe rel 21 showed the number of eggs produced and the y 高中 23. Average Temperature and Precipitation Secondary schools 50 27 187 84 143 216 selection selection of U.S. of16U.S. cities, following following dataand data show show the the 18 of precipitation licensed beds andinches) the for number ofmonth staffed beds inare precipitation (in (in inches) for the the month of June of June are 6 684 Exercises andcities, 37the in the Section 10–2 Exercises price received per dozen for a given year. Based on 數之間的關係。 26. 26. Tall Tall Buildings Buildings An An architect wants wants to determine to determine (The information in the thistemperature exercise will used for The average normal daily (inisbe degrees Licensed Licensed beds bedx number number of days of Section days for for which which the precipitation precipitation greater is greater local hospitals. The data forarchitect a specific day are shown. Source: World shown shown here here forfor seven seven randomly randomly selected selected cities cities in the in the and 22Almanac. in 10–3.) the following data for athe random selection 125 the the relationship relationship between between the heights heights (in feet) (in feet) of of aofstates, a Exercise in0.01 Section and Exercises 16 Fahrenheit) and the corresponding average or Tribune-Review. 資料來源:Pittsburgh than than or10. equal or蛋的產量 最近的農業數據顯示某一年蛋的 equal to14 0.01 to inch inch and10–2 and the the number number of monthly days of days forand for20 in Describe the relationship. United United States. States. Determine Determine if there if there is a is relationship a relationship Staffed Staffed beds beds y canand it be relationship exists (The information will be used for sleet. building building and theconcluded the number number ofthat stories of astories in the in the building. building. Section precipitation (in inches) thesnow month of June are Per capita x1exercise 1924 907 1445 1608 which which there there is 10–3.) atdebt isin least atthis least inch 1for inch of of snow and/or and/or sleet. Is Is 661 between between the the two two variables. variables. (The (The information information in in 產量和每一打的價格。根據下列隨機挑選一 between the number of eggs produced and the price Source: Source: Pittsburgh Pittsburgh Tr Licensed 144 175 185 100 169 許可床位數 Exercise 17 offor Section 10–2.) The The datadata forbeds for axsample ax sample of 10 of32 buildings 10 buildings in Pittsburgh in208 Pittsburgh shown here seven randomly selected cities the 1317 there there a significant a significant linear linear relationship relationship between between thein the thisthis exercise exercise willwill be be used used forfor Exercise Exercise 23 23 in in Per capita tax yContributions 1685 1838 1734 1842 5 per dozen? (The information in this exercise will be 15. Alumni The director of an 些州的數據,可以認為蛋的產量和每一打的 are are shown. shown. Explain Explain the the relationship. relationship. United States. Determine if there is a relationship y 工作人員床位數 Staffed beds y 112 32 162 141 103 80 118 variables? variables? Section Section 10–2.) 10–2.) (The (The informatio informa used for Exercise 19 in Section 10–2.) Source: World Almanac. 18. Triples and Home Runs data belowwants showto alumni association for(The aThe small college between the two variables. information in 價格之間存在某一種關係嗎? Source: Pittsburgh Tribune-Review. y Exercise Exercise 28 28 of th o Stories Stories x x 64 64 54 54 40 40 31 31 45 45 38 38 42 42 41 41 37 37 40 40 the number of three-base hits (triples) and the number determine whether there is any type of relationship Precipitation Precipitation � � Avg. Avg. daily daily temp. temp. x x 86 86 81 81 83 83 89 89 80 80 74 74 64 64 this exercise will be used for Exercise 23 in No. of eggs r of Section Section 10–2.) 10–2 17. School Districts and Secondary Schools A Almanac. 資料來源:World of0.01 home runs hitthe during the 61 season by random between amount of111 an111 alumnus’s contribution 0.01 inch inch 61 140a140 116 116sample 88 88136136 Height Section 10–2.) Height y mo. y841 841 725 635 616 616 615 615 582 582 535 520 511 511 485 (The information exercise will be535 used for (millions) x725 957 1332 1163 1865 273 Avg. Avg. mo. precip. precip. yin 635 ythis 3.4 3.4 1.8 1.8 3.5 3.5 3.6 3.6 3.7520 3.7 1.5119 1.5 0.2485 0.2 dios random sample the following of MLB Is there a states significant relationship (inteams. dollars) andof the yearsyielded the alumnus has been Exercise 28 of this section and Exercise 27 in Source: Source: World World Almanac Almanac Book Book of Facts. of Facts. Snow/sleet Snow/sleet � � .psBased Avg. daily temp. x The 86 81 83(The 80 74 64 Price per dozen numbers of local school and89the corresponding Source: Source: New New York York Times Times Almanac. Almanac. between theofdata? out school. datadistricts follow. information is used Section 10–2.) 1 in1numbers in 2 schools. 2 15 15 21 21 8 a significant 8 11 1113 13 a (dollars) y 0.770 0.697 0.617 0.652 1.080 1.420 of secondary Is there ver for Exercises 15, 36, and 37 in Section 10–2 and Avg. mo. precip. y 23 3.4 1.8 3.5 193.6 3.7 1.5430.2 (The(The information information in this in this exercise exercise willwill be used be used for for Triples 25 51 20 d the relationship between the data? Source: Source: World World Almanac. Almanac. cres Exercises 17 and 21 in Section 10–3.) Source: Exercise Exercise 26 World of 26Section ofAlmanac. Section 10–2.) 10–2.) Source: New York Times Almanac. Home runs 212 199 144 160 149 122 ibe School districts 53 will 19 will 17 (The(The information information in this in this exercise exercise be24used be used for for95 68 Source: New York Times Almanac. 12 21 Exercise Exercise 22 in 22Section in Section 10–2.) 10–2.) 27. 27. Hospital Hospital Beds Beds A hospital A hospital administrator administrator wants wants10–17 Secondary schools 50 27 187 84 143 216 10–18 10–18 to see to see if there if there is a isrelationship a relationship between between the the number number 45 (The Source: information in this exercise will be used for World Almanac. 23. 23. Average Average and Precipitation Precipitation of licensed of licensed bedsbeds andand the the number number of staffed of staffed bedsbeds in in 8 154 125 Exercises 18 andTemperature 38Temperature in Sectionand 10–2.) 484 The(The The average average normal normal daily daily temperature temperature (inbedegrees (in degrees local local hospitals. hospitals. The The data data for for a specific a specific day day are are shown. shown. information in this exercise will used for 10–18 or15 Fahrenheit) and the the corresponding corresponding average average monthly monthly Describe Describe the the relationship. relationship. Exercise 17and of Section 10–2.) 19. Fahrenheit) Egg Production Recent agricultural data ises 15 precipitation precipitation (in inches) (in inches) for for the the month month of June of are are showed the number of eggs produced and theJune Licensed Licensed beds beds x x 144144 32 32 175175 185185 208208 100100 169169 shown shown here here for for seven seven randomly randomly selected selected cities cities the inshow the 18. Triples and Home Runs The data below price received per dozen for a given year. Basedinon United United States. States. Determine Determine if there if there is a is relationship a relationship the number (triples) the number Staffed Staffed beds beds y y 112112 32 32 162162 141141 10310380 80118118 the following dataofforthree-base a randomhits selection ofand states, in between the the two variables. variables. (The information information in in sample ofbehome runstwo hit during the(The season by a random can itbetween concluded that a relationship exists Source: Source: Pittsburgh Pittsburgh Tribune-Review. Tribune-Review.


相關與迴歸

10

10-2 迴歸 想研究兩變數之間的關係,我們會收集數據,然後建構一張散佈圖。如前 所述,這一張散佈圖的用意是用來決定關係的本質。可能性包括正線性關係、 負線性關係、曲線關係或是無明顯關係。散佈圖完成之後,下一步是計算相關

學習目標 氥 計算迴歸直線的方 程式。

係數的數值並且檢定關係的顯著性。如果相關係數是顯著的,下一步就是決定 迴歸線 (regression line) 的方程式,它會和數據最相符。(注意:當 r 不顯著 的時候,決定迴歸線的方程式並用來預測是無意義的。)迴歸線的目的是讓研 究員了解趨勢,並且根據數據進行預測。

�� 最適線 圖 10-11 顯示兩變數的一張散佈圖。圖上顯示數條鄰近數據的直線。對於 這樣的散佈圖,你必須能夠畫出一條「最適線」。「最適」意味著數據點與直 線垂直距離的平方和會最小。你需要一條最適線的理由是,可以用 x 值預測 y 值。因此,數據愈接近那一條直線,愈合適而且預測會愈好。見圖 10-12。當 r 是正的,直線斜率往上揚。當 r 是負的,直線斜率從左邊往右邊下滑。

�� 決定迴歸線方程式 直線的代數方程式定義為 y = mx + b,其中 m 是斜率,而 b 是 y 截距。 (需要複習代數的學生,請在繼續之前先閱讀隨書光碟中附錄 A,第 A-3 節。)在統計學,迴歸線的方程式寫成 y′ = a + bx,其中 a 是 y′ 截距,而 b 是 y

圖 10-11 散佈圖和三條與 數據相符的直線

x

485


 統計學

y

圖 10-12 一組數據的最適 線

d7

d6 d5

觀察值 d3

d4

d2

d1

預測值

圖 10-13

x

代數與統計學裡 的直線 y9

y

y ′ 截距

斜率 y 截距

斜率

y = mx + b y = 0.5x + 5

y9 = a + bx y9 = 5 + 0.5x

Dy = 2 Dx = 4 m=

5

Dy9 = 2 Dx = 4

Dy 2 = = 0.5 Dx 4

b=

5

Dy9 2 = = 0.5 Dx 4

x

x

(a) 直線的代數 Algebra of a line

(b) Statistical 迴歸線的統計學符號 (b) notation for a regression line

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM 斜率。見圖10-13。

Page 553

有許多方法可以求出迴歸線的方程式,這裡會提供兩種公式。這些公式會 用到計算相關係數時的那些數值。這些公式的數學推導已超過本書設定的範 圍。 Section 10–2 Regression

迴歸線 y ′ =a +bx 的公式

Formulas for the Regression Line y � � a � bx a� b�

� �y �� �x2 �

� ��x���xy� n��x2 � � ��x� 2

n��xy� � � �x���y� n� �x2 � � � �x� 2

b 是迴歸線的斜率。 其中 a 是 y′ 截距,而where a is the y� intercept and b is the slope of the line.

計算截距與斜率的四捨五入原則 將 a 和 b 的數值四捨五入到第三位小數。

Rounding Rule for the Intercept and Slope Round the values of a and three decimal places.

486

Example 10–9

Car Rental Companies Find the equation of the regression line for the data in Example 10–4, and graph the


b�

n��xy� � � �x���y� n� �x2 � � ��x� 2

b b�� n� �x22� � � �x� 22 � �x � �2 n�n�x � �x� 2� �� � �x

where aa ais and where isisthe the y�y�intercept intercept and bb is is the the slope slope of of the the line. line. where they� where a is the y� intercept and b is the slope ofintercept the line.and b is the slope of the line.

Rounding Rule for three decimal places.

10

Rounding Rule for the Intercept and Slope Round the values ofofaa aand and toto Rounding Rule for the and Slope Round Rounding Rule for theIntercept Intercept and Slope Round thevalues valuesof andbb bto the Intercept and Slope Round the values of a and b to the three 相關與迴歸 three decimal places. threedecimal decimalplaces. places.

Example 10–9 Car Rental Companies Example 10–9 Example 10–9 Car CarRental RentalCompanies Companies 租車公司 Example 10–9例題 10-9 Car Rental Companies Find the equation ofofthe the regression line for the data ininExample Example 10–4, and graph the line Find the equation regression line for data in Find the equation the regression line forthe the data Example 10–4,and andgraph graphthe theline line Find the equation of the regression lineplot forof thethe data in Example 10–4, and graph the line 10–4, on the scatter of data. 求出例題 10-4 數據的迴歸線方程式,並且在散佈圖上畫出這一條直線。 on the scatter plot of the data. on the scatter plot of the data. on the scatter plot of the data. Solution Solution Solution

■解答 Solution

needed for the equation are n n� �x n = The ∑ x values =values ∑y= ∑ xy = 682.77 x2� =�153.8, 解方程式需要的數字有 153.8, 18.7, 5859.26。代入公式 The values needed the are �和 6,6,∑ �x � �y ��18.7, 18.7, �xy ��682.77, 682.77, The needed for the equation are �6, �x 153.8,�y �y� 18.7,�xy �xy� 682.77, The values needed for6, the equation are n � for 6,Substituting �x �equation 153.8,in�y �nformulas, 18.7, �xy �153.8, 682.77, 2 2� 5859.26. the you get and �x 2 � 5859.26. Substituting in the formulas, you get and �x 2 � 5859.26. Substituting in the formulas, you get and �x 你會得到 and �x � 5859.26. Substituting in the formulas, you get 22 �� 2� �� �x �� �� �y �� �� � 18.7 �� �� � � 153.8 �� �� �� �x �� �xy �� 5859.26 �� 682.77 �y �x � �xy 18.7 5859.26 � 682.77 � �y �����x � ��x � �x �� �xy � ��� �682.77 �� 5859.26 � �� 153.8 � 153.8 �� 682.77 � 18.7 � �y �� �x2 � � � �x �� �xy �� � 18.7 �� � 153.8 � a � � 5859.26 a � � ��0.396 0.396 2 2 2 a � � 0.396 2 2 2 �6 � �x � 2� � �x � 2 �� 5859.26 �� � 153.8 � 2 n �x � �x 6 5859.26 � 153.8 a� � � 0.396 � � � � � �� � � � n � � � �x� �� � 153.8 � 2 � 6 �� 5859.26 � � � 153.8 � n��6�x �� 5859.26 n��x2 � � ��x� 2 �� � � �x �� �� �� � � 153.8 �� �� n �� �y �� 18.7 6� ��6682.77 �xy � �x �y 682.77 � 18.7 n��n�xy � �xy � ���153.8 � �x �� �� � 6 � 682.77 � �� 153.8 � 153.8 �� 18.7 � 0.106 �y � 682.77 � 2� � � 18.7 6� n��xy� � ��x���y� b � b � 2 2 � 0.106 b � 2 2 2 �� � �x � 2� � �x � 2 ��0.106 � � � 153.8 � 2 ��0.106 n �x � �x 6 5859.26 � 153.8 � b� � � � � � �� � � � n 6 5859.26 � n� �x � �� � 6 �� 5859.26 � 153.8 � 2 � 6 �� 5859.26 � � � 153.8 � n� �x2 � � ��x� 2 ���x Hence, the of regression line Hence, the equation ofofathe the line y�y�� ��a a a� ��bx bxbxis isis Hence, theequation equation the regression liney� y′ =equation a + bx 的方程式是 因此,迴歸線 Hence, the of the regression line y� � �regression bx is y� � 0.396 ��0.106x 0.106x 0.396� 0.106x y� � 0.396 � 0.106x y�y���0.396 To graph the line, select any for x and find corresponding To the any two points for and find the corresponding values for Tograph graph theline, line, anytwo twopoints points for and findthe the correspondingvalues valuesfor for x,求出它們對應的 y。使用介於 x 值。比如 為了畫這一條迴歸線,挑選兩個 10 和x x60 之間的 To graph the line, select any two points forselect xselect and10 find the corresponding values for y. Use any x values between and 60. For example, let x � 15. Substitute in the equay. Use any x values between 10 and 60. For example, let x � 15. Substitute in the equay. Use any x values between 10 and 60. For example, let x � 15. Substitute in the equay. Use any x values between 10 and 60. For example, let x � 15. Substitute in the equayfind ′ 值。 說,令 x = 15,代入迴歸方程式求出 tion tion and the corresponding y�y�value. value. tionand and findthe thecorresponding correspondingy� value. tion and find the corresponding y�find value. y� y�y�� ��0.396 0.396 0.396 y� � 0.396 � 0.396 � 0.396 ��0.106(15) 0.106(15) � 0.396� 0.106(15) � 0.396 � 0.106(15) ��1.986 1.986 � 1.986 � 1.986 Let ��40; 40; then 令 x = 40;然後 Let Letxx x� 40;then then Let x � 40; then y� � 0.396 ��0.106x 0.106x 0.396� 0.106x y� � 0.396 � 0.106x y�y���0.396 ��0.396 0.396 � 0.106(40) � � 0.106(40) 0.396 � 0.106(40) � 0.396 � 0.106(40) � 4.636 � 4.636 � 4.636 � 4.636 Then plot the two points (15, 1.986) and (40, 4.636) and draw a line connecting the two Then plot two points (15, 1.986) (40, 4.636) draw 接著在散佈圖上描上這兩點 1.986) 和 (40, 4.636),接著把這兩點用一條直線連起來。見 Then plotthe the two points (15, 1.986) (40, 4.636)and and draw lineconnecting connectingthe thetwo two Then plot the two points(15, (15, 1.986) and (40, 4.636) and drawand aand line connecting the twoa aline points. See Figure 10–14. points. See Figure 10–14. points. See Figure 10–14. points. See Figure 10–14. 圖 10-14。 Note: When the line, sometimes necessary Note: When you draw the regression line, itto isistruncate sometimes tototruncate truncate the Note: Whenyou youdraw draw theregression regression line,it itis sometimes necessaryto truncatethe the Note: When you draw the regression line, itThis is sometimes necessary thenecessary graph (see Chapter 2). is done when the distance between the origin and the first graph (see Chapter 2). This is done when the distance between the origin and the first graph (see Chapter 2). This is done when the distance between the origin and the first graph (see y Chapter 2). This is done when the distance between the origin and the first labeled coordinate on is as between 圖 10-14 coordinate ononthe the axis isisnot not the same asasthe the distance the rest ofofthe the coordinate thexxthe xaxis axis notthe thesame same thedistance distance betweenthe therest restof the labeled coordinate on thelabeled xlabeled axis is not the same as distance between the rest of the between 7.75 例題 10-9 的迴 歸線

Revenue (billions)

收益︵十億美元︶

6.50

10–21 10–21 10–21

10–21

5.25 y9 = 0.396 + 0.106x 4.00

2.75

1.50 x 8.5

17.5

26.5

35.5

44.5

53.5

62.5

車輛數(萬輛) Cars (in 10,000s)

487


Reven

Rev

4.00

2.75

4.00

2.75

1.50

1.50

 統計學

8.5

17.5

26.5

x 35.5

8.5 17.5 Cars (in 10,000s)

44.5

26.5

53.5

35.5

62.5

x

44.5

53.5

62.5

Cars (in 10,000s)

labeled x coordinates or the distance between the origin and the first labeled y� labeled x coordinatesxor軸第一個標示的座標距離和 the distance between the origin and the first labeled y� x 軸其他座標之間 注意:當你繪製迴歸線的時候,如果原點與 coordinate is not the same as the distance between the other labeled y� coordinates. coordinate is not the same as the distance between the other labeled y� coordinates. When the x axis or the y axis has been truncated, do not yuse y ′ 軸第一個標示的座標的距離和 ′ the y� intercept value to 的距離不一樣,或是原點與 When the x axis or the y axis has been軸其他座標之間的距離不一樣, truncated, do not use the y� intercept value to graph the line. When you graph the regression line, always select x values between the x 軸或是 ′ 軸被截斷的時候,不要用 y ′select 這時候必須截斷圖形(請參考第 2 章)。當 截距的數值 graph the line. When you ygraph the regression line, always x values between the smallest x data value and the largest x data value. smallest x data value and the largest x data value. 繪製迴歸線。當你繪製迴歸線的時候,永遠要挑選最小 x 值和最大 x 值之間的數字。

Example 10–10

Absences and Final Grades Find the equation of the regression line for the data in Example 10–5, and graph the line Find the equation of the regression line for the data in Example 10–5, and graph the line on the scatter plot. 求出例題 10-5 數據的迴歸線方程式,並且在散佈圖上畫出這一條迴歸線。 on the scatter plot. Example 10–10 Absences and Final Grades 例題 10-10 缺席與期末成績

■解答

Solution

Solution

The values needed for the equation are n � 7, �x � 57, �y � 511, �xy � 3745, and values needed for the equation � 57, �y � 511, �xy � 3745, and istorical解方程式需要的數字有 Note n = 7,The ∑ xin= ∑ y = 511, ∑ xy = 3745are =�x 2 57, 和 n∑�x27, 579。代入公式你會得到 � 579.Note Substituting the formulas, you get �xistorical �x2 � 579. Substituting in the formulas, you get

H

H

In 1795, Adrien-Marie � �y �� �x2 � � � �x �� �xy � � 511 �� 579 � � � 57 �� 3745 � In 1795, Adrien-Marie � �y 579� � �57 ��3745� a� � ���x2 � � ��x���xy� 2 �511 ���102.493 Legendre (1752–1833) 2� 2 � � � � 7 �� 579 � � � 57 �� � 102.493 Legendre (1752–1833)n �x � �x a � measured the meridian � 7 �� 579 � � � 57 � 2 n��x2 � � ��x� 2 measured the meridian n��xy� � ��x���y� �7 ��3745� � � 57 ��511 � arc on the earth’s 7 ���3.622 3745� � � 57��511� arc on the earth’s � n��xy� � ��x���y�2 �� b� surface from � � � 57 � � � �3.622 n� �x2 � � ��x� 2 b � �7 ��579 2� 2 surface from � � � � n �x � �x 7��579 � � �57 � 2 Barcelona, Spain, to Barcelona, Spain, to Hence, the equation of the regression line y� � a � bx is y′ = a+ bx 因此,迴歸線 的方程式是 Dunkirk, England. This Hence, the equation of the regression line y� � a � bx is Dunkirk, England. This measure was used as y� � 102.493 � 3.622x measure was used as y� � 102.493 � 3.622x the basis for the the The basisgraph for theof the line is shown in Figure 10–15. measure of the meter. 迴歸線的圖形顯示在圖 10-15。The graph of the line is shown in Figure 10–15. measure of the meter. Legendre developed Legendre developed the least-squares y9 coefficient and the sign of the slope of the regression line the least-squares The sign of the correlation method around 圖 the10-15 The sign of the correlation coefficient and the sign of the slope of the regression line method around the will always be the same. That is, if r is positive, then b will be positive; if r is negative, year 1805. 例題 10-10 的迴 will always be the same. That is, if r is positive, then b will be positive; if r is negative, yearthen 1805. 100The reason is that the numerators of the formulas are the same b will be negative.

10–22

期末成績

10–22

then b will be negative. The reason is that the numerators of the formulas are the same and determine the signs of r and b, and the denominators are always positive. The regresand 90 determine the signs of r and b, and the denominators are always positive. The regression line will always pass through the point whose x coordinate is the mean of the x valsion line will always pass through the point whose x coordinate is the mean of the x values and whose y coordinate is the mean of the y values, that is, (x, y). 80 ues and whose y coordinate is the mean of the y values, that is, (x, y). Final grade

歸線

70 y9 = 102.493 – 3.622x

60 50 40 30 0

x 5

10

15

缺席次數 Number of absences

相關係數的正負號和迴歸線斜率的正負號總是一樣的。也就是說,如果 r 是正的數字,則 b 也會是正的數字;如果 r 是負的數字,則 b 也會是負的數 字。理由是前後兩項公式的分子是一樣的,而且分子決定最後答案的正負號, 488


50 40 30 0

x 5

10

15

Number of absences

相關與迴歸

10

因為分母總是正的。另外,迴歸線總是會通過 值的平均數和 值的平均數合 The regressionxline can be used toy make predictions for the dependent variable. The method for making predictions is shown in Example 10–11. 起來的那一點,也就是點 (x, y)。

迴歸線可以用來預測依變數。預測的方法提示在例題 10-11。 Example 10–11

例題 10-11

租車公司

Car Rental Companies Use the equation of the regression line to predict the income of a car rental agency that has 200,000 automobiles.

使用迴歸線的方程式預測租車公司有 200,000 部車時的收益。 ■解答

Solution

Since the x values are in 10,000s, divide 200,000 by 10,000 to get 20, and then 因為 x 值以萬輛為單位,把 200,000 除以 20,然後把 20 代入方程式的 x。 substitute 2010,000 for x in得到 the equation. y� � 0.396 � 0.106x � 0.396 � 0.106(20) � 2.516 Hence, when a rental agency has 2.516 200,000 automobiles, its revenue will be approximately 因此,當某家租車公司有 200,000 部車,它的收益大概會是 十億美元。 $2.516 billion.

在例題 10-11 中得到的數字是一種點預測,而且根據點預測無法得知準確 The value obtained in Example 10–11 is a point prediction, and with point predictions, no degree of accuracy or confidence can be determined. More information on 度也無法決定信賴程度。更多的預測資訊會在第 10-3 節討論。 prediction is given in Section 10–3.

當其他變數改變一個單位引起某變數改變多少,叫做邊際變化 (marginal The magnitude of the change in one variable when the other variable changes exactly

is called a marginal change. The value of slope b of the regression line equation change)。迴歸線方程式的斜率1 bunit 就代表邊際變化。比如說,在例題 10-9,迴

represents the marginal change. For example, in Example 10–9 the slope of the regres-

sion line is 0.106, which means for each increase of 10,000 cars, the value of y changes 歸線的斜率是 0.106,這意味著每增加 1000 部車,平均而言會增加 0.106 個單 0.106 unit ($106 million) on average.

位的收益,也就是 1.06 億美元。 When r is not significantly different from 0, the best predictor of y is the mean of the data values of y. For yvalid predictions, the value of the correlation coefficient must be y 的最佳預測是 當 r 不是顯著地不為 0,則 的平均數。為了正確預測, significant. Also, two other assumptions must be met.

相關係數的值必須是顯著的,同時必須滿足三項假設。

10–23

正確迴歸預測的假設 1. 樣本是隨機樣本。 2. 針對獨立變數 x 的任意特定值,依變數 y 的值必須是依著迴歸線呈現常態分 配的。詳見圖 10-16(a)。 3. 針對獨立變數 x 的任意特定值,每一個依變數的標準差必須一致。詳見圖 10-16(b)。

外插 (extrapolation),或是超過數據範圍進行預測,必須謹慎解釋。比如 說,在 1979 年,有一些專家預測美國會在 2003 年之前用盡石油。這一項預測 是根據當時的石油用量和已知的石油存量。但是,自從那時候起,汽車工業開 始生產節省能源的車子,同時還有許多未被發現的石油礦場。最後科學家說不 489


1. The sample is a random sample. 2. For any specific value of the independent variable x, the value of the dependent variable y must be normally distributed about the regression line. See Figure 10–16(a). 3. The standard deviation of each of the dependent variables must be the same for each value of the independent variable. See Figure 10–16(b).

 統計學 Figure 10–16 Assumptions for Predictions

圖y 10-16

y y

預測的假設

y

y’s

y = a + bx

y

y

y

�x

y

�x

y ’s

x

�x

x

(a) Dependent variable y normally distributed

x2

n

y = a + bxx

1

x1

�x

2

xn

(b) �1 = �2 = . . . = �n

mx

mx mx

n

2

m

x1 x x Extrapolation, or making predictions beyond the bounds of the data, must be interx1 x2 xn preted cautiously. For example, in 1979, some experts predicted that the United States y 呈現常態分配 (a) 依變數 (b) This Dependent variable y normally distributed s1 = s2prediction = . . . = sn was based on the current conwould run out of oil by the year 2003. sumption and on known oil reserves at that time. However, since then, the automobile industry has produced many new fuel-efficient vehicles. Also, there are many as yet undiscovered oil fields. Finally, science may someday discover a way to run a car on 定會在哪一天發明一種只需使用花生油的車。另外,每加侖原油價格曾被預測 something as unlikely but as common as peanut oil. In addition, the price of a gallon of 幾年後會漲到 10 美元。幸運地這件事沒發生。記得,進行預測的時候,要根 gasoline was predicted to reach $10 a few years later. Fortunately this has not come to pass. Remember that when predictions are made, they are based on present conditions or 據現在的情況或是根據趨勢不會變的信心。未來也不一定有能力證實或是反證 on the premise that present trends will continue. This assumption may or may not prove true in the future. 這一項假設。 The steps for finding the value of the correlation coefficient and the regression line 求出相關係數與迴歸線方程式的步驟摘要在以下的程序表。 equation are summarized in this Procedure Table:

x

Interesting Fact

Procedure Table

程序表 求出相關係數與迴歸線方程式 It is estimated that lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 557 Finding the Correlation Coefficient and the Regression Line Equation wearing a motorcycle 趣 聞the helmet reduces risk of a fatal accident 根據估計,戴安全 by 30%. 帽可以降低 30% 車禍的致命風險。

步驟 2 所示。 Step 1 製作一張表格,如步驟 1 Make a table, as shown in step 2.

Step 2 求出 2 Find the x2, and y2. Place them in the appropriate columns and sum y2xy, xy,values x2 和 of 。將這些值放在適當的行內並且加總每一行。 步驟

each column.

�x �

x

y

xy

x2

y2Section 10–2 Regression

� � �

� � �

� � �

� � �

� � �

Procedure Table (Continued ) �y � �xy �

�x 2 �

�y2 �

557

Step 3 rSubstitute in the formula to find the value of r. 步驟 3 代入公式求出 值。

r�

10–24

n��xy� � ��x���y� 2[n��x � ��x� 2][n��y 2 � � � �y� 2] 2�

Step 4 When r is significant, substitute inathe formulas findb the values of a and b for the y′ = + bx 步驟 4 當 r 是顯著的,代入公式為迴歸方程式 求出 ato 和 的數值。

regression line equation y� � a � bx. � �y �� �x 2 �

� ��x���xy�

n��xy� � ��x���y�

a� b� 散佈圖應該被用來檢查離群值。離群值是那些看起來和其他點處在不一 n� �x 2� � ��x� 2 n��x 2� � ��x� 2

樣位置的點(請參考第 3 章)。這種點的某一些可能會影響到迴歸線的方程 式。如果的確如此,則這樣的點叫做影響點 (influential points) 或是影響觀察值 (influential observations)。 490

A scatter plot should be checked for outliers. An outlier is a point that seems out of place when compared with the other points (see Chapter 3). Some of these points can affect the equation of the regression line. When this happens, the points are called influential points or influential observations. When a point on the scatter plot appears to be an outlier, it should be checked to see if it is an influential point. An influential point tends to “pull” the regression line toward the point itself. To check for an influential point, the regression line should be graphed


that excludes the point from the data set. If the position of the second line is change siderably, the point is said to be an influential point. Points that are outliers in the x tion tend to be influential points. Researchers should use their judgment as to whether to include influential ob tions in the final analysis of the data. If the researcher feels that the observation necessary, then it should be excluded so that it does not influence the results of the However, if the researcher feels that it is necessary, then he or she may want to additional data values whose x values are near the x value of the influential point an 相關與迴歸 include them in the study.

10

當散佈圖上某一點可能是離群值,應該要檢查它是不是影響 點。影響點會把迴歸線拉往它的方向。為了檢查是否為影響點,我 們會先用該點繪製迴歸線,接著去掉那一點之後再繪製一次迴歸 線。如果這兩條線的位置改變不少,則這一個離群值就會被認為是 影響點。在 x 方向的離群值比較容易成為影響點。 研究員必須使用專業判斷,決定是否應該在最後分析的時候加 入影響點。如果研究員覺得不需要這一項觀察值,則剔除它,讓它 不影響研究的最後結果。不過,如果研究員覺得有必要,則他可能

Explain that to me. ” © Dave Carpenter. King Features Syndicate.

x 值的數據,然後把它們加入研究的數據 希望取得那些靠近影響點 lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 558 分析。

558

1

Chapter 10 Correlation and Regression

觀念應用 10-2

再次探討剎車距離Applying the Concepts 10–2 在一項速度與煞車距離的研究,研究員透過測量煞車痕跡尋求一種方法估 Stopping Distances Revisited In a study on speed and braking distance, researchers looked for a method to estimate how fast

計車禍前人們的車速。這項研究重視的是在不同速度下需要多少距離才能讓車 a person was traveling before an accident by measuring the length of the skid marks. An area that was focused on in the study was the distance required to completely stop a vehicle at 子完全停下來。使用以下的表格回答問題。 various speeds. Use the following table to answer the questions. MPH

煞車距離(英呎)

20 30 40 50 60 80

20 45 81 133 205 411

Assume MPH is going to be used to predict stopping distance.

假設 MPH 是被用來預測煞車距離。

1. Find the linear regression equation.

2. What does the slope tell you about MPH and the braking distance? How about the 1. 求出迴歸線方程式。 y intercept?

2. 由迴歸線斜率可知道哪些關於 MPH 和煞車距離之間的事? 3. Find the braking distance when MPH � 45. 4. Find the braking distance when MPH � 100. 3. 當 MPH = 45 的時候,煞車距離是多少?

5. Comment on predicting beyond the given data values.

4. 當 MPH = 100 的時候,煞車距離是多少? See page 590 for the answers.

5. 評論在數據範圍外的預測。 答案在第 50910–2 頁。 Exercises 1. What two things should be done before one performs a regression analysis? 2. What are the assumptions for regression analysis? 3. What is the general form for the regression line used in statistics? y� � a � bx 4. What is the symbol for the slope? For the y intercept? b, a 5. What is meant by the line of best fit? 6. When all the points fall on the regression line, what is the value of the correlation coefficient? r would equal �1 or �1. 7. What is the relationship between the sign of the

11. When the value of r is not significant, what value should be used to predict y? When r is not significant, the mean of the y values should be used to predict y.

For Exercises 12 through 27, use the same data as for the corresponding exercises in Section 10–1. For each exercise, find the equation of the regression line and find the y� value for the specified x value. Remember that no 491 regression should be done when r is not significant. 12. Gas Tax and Fuel Use The gas tax and fuel use are shown. Tax

21.5

23

18

24.5

26.4

19

Usage

1062

631

920

686

736

684


Section Section10–2 10–2Regression Regression 559 559Heights are shown. are shown. Heights y 841 y 841 725 76 ng to be used to predict stopping distance. No. of No. eggs of eggs Find Find y� when y� when x � 4x (million) (million) 957 957 13321332 11631163 18651865 119 119273 273 egression equation. 14. 14.Forest ForestFires Firesand andAcres AcresBurned BurnedNumber Numberofoffires firesand and 22. 22.Precipitation Precipitationand andSnowfall/Sleet Snowfall/SleetThe Thenumber numberofofdays days 27. Hospital 27. Hospital BedsBeds Lic PricePrice per per lope tell you about MPHofof and theburned braking distance? How about the number number acres acres burnedare are asasfollows: follows: ofofprecipitation precipitationand andsnowfall/sleet snowfall/sleetare areshown. shown. follow: follow: dozen dozen ($) ($) 0.7700.770 0.6970.697 0.6170.617 0.6520.652 1.0801.080 1.4201.420 Fires Firesxx 72 72 69 69 58 58 47 47 84 84 62 62 57 57 45 45 Precipitation Precipitation 61 61 111 111 140 140 116 116 88 88 136 136 Licensed Licensed bedsbeds x Find Find y� when y� when x � 1600 x � 1600 million million eggs. eggs. g distance when MPH � 45. Section Section 10–2 10–2 Regression Regression 559 559 Acres Acresy y 62 62 41 41 19 19 26 26 51 51 15 15 30 30 15 15 Snow/sleet Snow/sleet 2 2 15 15 21 21 8 8 11 11 13 13 y� � 1.252 y� � 1.252 � 0.000398x; � 0.000398x; y� � 0.615 y� � 0.615 per dozen per dozen Staffed Staffed bedsbeds y y 統計學  g distance when MPH � 100. 20. Emergency 20.Find Emergency Calls Calls and Temperature anddays. Temperature Temperature Temperature in in �31.46��1.036x; 1.036x;30.7 30.7 Find Findy�y�when whenx x��60. 60.y�y����31.46 Find y� y� when when x x � � 100 100 days. Find Find y� when y� when x � 4x 82_ch10_533-590.qxd h10_533-590.qxd 9/13/10 9/13/10 2:172:17 PM PM PagePage 559 559 edicting beyond the given data values. degrees Fahrenheit Fahrenheit and number and number of emergency of emergency calls calls are are y�degrees y�� ��7.327 �7.327 ��0.175x; 0.175x; 10.173 10.173 inin 14. 14. Forest Forest Fires Fires and and Acres Acres Burned Burned Number Number of of fires fires andand 22.22. Precipitation Precipitation and and Snowfall/Sleet Snowfall/Sleet The The number number of of days days 15. 15. Years Years and and contribution contribution data data are areasas follows: follows: shown. shown. 23. 23.of Average Average Temperature Temperature and andPrecipitation Precipitation Temperatures TemperaturesFor Exercises answers. number number of of acres acres burned burned areare as as follows: follows: of precipitation precipitation andand snowfall/sleet snowfall/sleet areare shown. shown. For Exercises 28 throu 28 t Years Yearsxx 11 55 33 10 10 77 66 (in (indegrees degrees Fahrenheit) Fahrenheit) and andprecipitation precipitation (in (in93 inches) inches) are Temperature Temperature x x 68 68 74 74 82 82 88 88 93 99 99 101are 101analysis analysis by performin by perfor 練習題 10-2 Fires Fires x x 7272 6969 5858 4747 8484 6262 5757 4545 Precipitation 6161 111111 140 140 116116 8888 136 136 asasPrecipitation follows: follows: Contribution Contributiony,y,$$ 500 500 100 100 300 300 50 50 75 75 80 80 No. of No. calls of calls y y 7 74 48 10 8 10 11 11 9 913 13 a. Draw a scatter a sc Acres Acres y y 62 62 41 41 19 19 26 26 51 51 15 15 30 30 15 15 Snow/sleet Snow/sleet 2 2 15 15 21 21 8 8 11 11 13 13 a. Draw Avg. daily daily temp. temp. xy� �86 86 81 81 83 83 89 89 80 80 74 74 64 64 Find Findy� whenx x��44years. years.y�y���453.176 453.176��50.439x; 50.439x;251.42 251.42 1.y�when 迴歸分析之前要先做哪兩件事? x= y′。 �7.544 y� � �7.544 � 0.190x; � 0.190x; 7.656,7.656, or 8 calls or 8 calls FindAvg. Find y� 當 when y� when x 1600,求出 � 80�F. x �x80�F. b. Compute b. Compute the cot Section 10–2 10–2 Regression 559559 y� �31.46 � �31.46 � 1.036x; � 1.036x; 30.7 30.7 Find Find y� y� when when x� x� 60.60.y� � Find Find y� y� when when x� x� 100 100 days. days. Avg. Avg. mo. mo. precip. precip. y ySection 3.4 3.4 1.8 1.8Regression 3.5 3.5 3.6 3.6 3.7 3.7 1.5 1.5 0.2 0.2 16. 16.State State2. Debt Debt and andPer PerCapita CapitaTaxes TaxesData Datafor forper percapita capita 21. Faculty c. State c. State the hypot the h 統計學一般用的迴歸線是什麼形式? 教職員和學生 一組小型大學的隨機樣本的 21.11. Faculty and Students and Students The number The number of faculty of faculty and the and the y� � y� �7.327 � �7.327 � 0.175x; � 0.175x; 10.173 10.173 in in 15.15. Years Years and and contribution contribution data data are are as are as follows: follows: state state debt debt and and per per capita capita state state tax tax are asas follows: follows: lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 559 d. Test d. the Test hypoth the hy number number of students of students in a random in a random selection selection of small of small y� y� � � �8.994 �8.994 � � 0.1448x; 0.1448x; 1.1 1.1 Find Find y� y� when when x x � � 70�F. 70�F. 23.23. Average Average Temperature Temperature and and Precipitation Precipitation Temperatures Temperatures 3. 何謂最適線? 教職員人數與學生數如下所示。 11. When the value of r1 is1Number not what performs Years Years xand x debt 5significant, 5 1445 3 fires 3of10 10 and 7value 7661 622. 6 Precipitation Per Per capita capita debt 1924 1924 907 907 1608 1608 661 colleges colleges are shown. are shown. e. Determine e. Determine the 14. aForest 14. Forest Fires Fires and Acres Acres Burned Burned Number of1445 fires and 22. Precipitation and and Snowfall/Sleet Snowfall/Sleet The number The number days of days (in (in degrees degrees Fahrenheit) Fahrenheit) andand precipitation precipitation (inof (in inches) inches) areare 4. 相關係數的正負號和迴歸線斜率的正負號有 24. NHL NHL Assists Assists andTotal TotalPoints Points The The number numberofofassists assists should beburned used toasare predict y? When r is not significant, the of24. number number ofcapita acres of acres burned are follows: as 1838 follows: precipitation ofas precipitation andand snowfall/sleet and are shown. are shown. f. Plot f. the Plotregress the re as follows: follows: x99 教職員人數 Contribution Contribution y,y$y,values $ 1685 500 500 100 100 1734 300 300 50 50 7575 8080 Faculty Faculty 99 110snowfall/sleet 110 113 113 116 116 138 138 174 174 220 220 Per Per capita tax 1685 1838 1734 1842 1842 1317 1317 mean oftax the should be used to predict y. and and the the total total number number of of points points for for a a sample sample of of NHL NHL 什麼關係? g. Summarize g. Summariz the lysis? FiresFires xFind xy� 72 72 58 58 47 47 84 84 62debt. 62 57 45 Precipitation 61 61 111 140 140 116 116 88 136 136 Avg. Avg. daily temp. temp. x shown. xshown. 86111 86 8181 83 8389 89 8088 80 7474 6464 ydaily Find Find when when x69 x12 469 years. 4 years. y�per � y� 453.176 � 453.176 � 50.439x; �57 50.439x; 251.42 251.42 學生數 Find y� y�y� when when x x� � �� $1500 $1500 in in per capita capita debt. Students Students 1353 1353 1290 1290 1091 1091 1213 1213 1384 1384 1283 1283 2075 2075 scoring scoring leaders leaders are are For Exercises through 27, use the same data as45for Precipitation Not Notsignificant significant sosononoregression regressionshould should bebedone. done. 5. x 相關係數與特定 之預測值的準確度之間有 the exercises in Section 10–1. For each Acres ySchool ycorresponding 62 62 41 41 19 19 26 Taxes 26 51 51 15Data 15 30 30 15 15 Snow/sleet Snow/sleet 2 yof 215 15 21regression 21 8line 8line 1136 13 ine used inAcres Avg. Avg. mo. mo. precip. precip. 3.429 1.8 1.8 3.53.5 3.6 3.6 3.7 3.7 1.5 1.5 0.2 0.2 28. Fireworks 28. Firewor an Now Now find the find equation the equation the the when when x11and xy13 and y 16. 16. State State Debt Debt and and Per Per Capita Capita Taxes Data for for perper capita capita Assists Assists 26 29 32 32 34 34 36 37 37 40 40 17. 17. School Districts Districts and and Secondary Secondary Schools Schools The The number number xy26 y3.4 現在求出調換 和ofregression 的迴歸線方程式。 Section 10–2 Regression 559 exercise, find the equation of the regression line and find 什麼樣的關係? for the foryears the years 19931 areFind interchanged. are interchanged. y� � �14.974 y� � �14.974 � 0.111x � 0.111x state state debt debt and and capita capita state state tax are are as as follows: follows: _ch10_533-590.qxd 2:17 PM Page 559 y� � �31.46 y�the � �31.46 �tax 1.036x; � 1.036x; 30.7 30.7 FindFind y� when y� when x9/13/10 � 60. x per � per 60. ofof school school districts districts and and the number number of of secondary secondary schools schools Find y� when y�y�points when x� 100 xx�� 100 days. Total Total points 48 48 68 66 �69 69 76 761.167 67 84 84 y�68 � y� �8.994 � 66 �8.994 0.1448x; � 0.1448x; 1.1 Find Find y� when when xdays. � 70�F. 70�F. 12. the y� value for the specified x value. Remember that noy� � �7.327 平均溫度與降雨量 平均每日常溫(以華氏 y� � �7.327 � 0.175x; � 0.175x; 10.173 10.173 in in in inPer the the district district are areshown. shown. 6. 當相關係數 r1924 不顯著的時候,我們用什麼預 Per capita capita debt debt 1924 907 907 1445 14451608 1608 661 661 tercept? a Years 15. b,Years 15. and contribution and contribution data data are as are follows: as follows: regression should be done when r is not significant. 23. Average y�y� ��The 2.693 2.693 ��1.962x; 1.962x; 62 62 Find Find y� y�Temperature when whenand x x� �and 30 30 assists. assists. 24.24. NHL NHL Assists Assists and Total Total Points Points The number number ofnumber of assists assists Average Temperature and Precipitation Precipitation Temperatures Temperatures 計)和平均每月降雨量(以英吋計)如下所 14. Forest Fires Acres Burned Number of fires and 23. 22. Precipitation and Snowfall/Sleet The of days School districts districts 53 19 24 17 1771842 95 68 68 Per Per capita tax tax and 1685 1838 1734 1734 1842 y? 測 Years Years xSchool x capita 1 1685 153 5 1838 519 3 24 310 10 795 6 1317 61317 and and the the total total number number of of points points for for a sample a sample of of NHL NHL (in degrees (in degrees Fahrenheit) Fahrenheit) and precipitation and precipitation (in inches) (in inches) are are 12. number Gas Taxofand Fuel Use are Theasgas tax and fuel use are 25. acres burned follows: of precipitation and snowfall/sleet are shown. 25.Fat Fat示。 Calories Caloriesand andFat FatGrams GramsThe Thenumber numberofoffat fat Secondary Secondary schools 50 27 27 187 187 84 84 143 216 as follows: Find Find y�shown. y� when x� $1500 $1500 in50in per per capita capita debt. debt. scoring leaders leaders areare shown. shown. asscoring follows: Contribution Contribution y,when $schools y, $x � 500 100 100 300 300 50 50 75 143 75 80 216 80 , what is the 針對練習題 772regression 到500 14,使用第 10-1 calories calories and andthe thenumber numberofof saturated saturated fat fat grams grams for fora a88 136 NotNot significant significant so no so no regression should should be be done. done.84節對應練習題 Fires x 69 58 47 62 57 45 Precipitation 61 111 140 116 Find Find y�when y�Districts when when x� x�� 70. 70.y� Since Since r�ris453.176 isnot not significant, significant, nonoregression regression Avg.Avg. qual �1 or �1. Assists Assists 26breakfast 26 29 29 32 32 34 34 36 36 37 37644040 x daily daily temp. temp. x x 86 86 81 81 83 83 89 89 80 80 74 74 64 平均每日常溫 Find Find y� when y� x � 4 x years. 4 years. � 453.176 y� � 50.439x; � 50.439x; 251.42 251.42 random random selection selection of of breakfast entrees entrees are are shown. shown. 17.17. School School Districts and and Secondary Secondary Schools Schools The The number number Section 10–2 Regression 的數據。對於每一道練習題,求出迴歸線方程 should should bebedone. done. Tax 21.5 Acres y 9/13/10 62 2:17 41 23 19 26 24.5 51 15 Snow/sleet 2 15 21 8 559 11 13 8582_ch10_533-590.qxd PM 18 Page 559 1526.430 19 of of school school districts districts and and the the number number of of secondary secondary schools schools Total Total points points 48 48 68 68 66 66 69 69 76 76 67 8484 y 平均每月降雨量 18. 18. Triples Triples and and Home Home Runs Runs The The number number of oftriples triples and andthe the Avg.Avg. mo. precip. precip. y y3.4 3.4 1.8220 1.8 3.5 3.5 3.6 3.7 3.7 1.5 460 1.5 0.2670.2 the 16. State Fat Fatmo. calories calories 190 190 220 270 270 3.6 360 360 460 540 540 16. State Debt Debt and and Per Capita Per Capita Taxes Taxes Data Data for per for capita per rcapita x y∙ 式,以及特定 的 值。記住,當 不顯著的 Usage 1062 631 920 686 736 684 y� � �31.46 � 1.036x; 30.7 Find y� when x � 60. in in the the district district areare shown. shown. Find y� when x � 100 days. number number of ofhome home runs runs obtained obtained byfollows: a aas selected selected sample sampleofof pe of the statestate debt debt and per and capita per capita state state tax are taxby asare follows: Sat. Sat. (g) 9� 88�8.994 13 13 17 17 23 23 27 y��� y�0.1448x; 2.693 ��2.693 � 1.962x; � 1.962x; 62 62 27 Find Find y� y� when when x� � 30 assists. assists. y�930 �8.994 y� � 0.1448x; 1.1 1.1 Find Find y� when y�fat when x(g) � 70�F. xx�� 70�F. y�fat 0.175x; 10.173 in x� =�7.327 y′。 當 70,求出 時候,不應該有迴歸。 14. Forest Fires and Acres Burned Number of fires and 22. Precipitation and Snowfall/Sleet The number of days MLB MLB players players are are shown. shown. School School districts districts 53 53 19 19 24 24 17 17 95 95 68 68 15. Years and contribution data are as follows: itive. When Finddebt y� when � 907 $0.25. Not 1445 significant so661 no regression Per capita Per capita debt 1924x1924 907 1445 16081608 661 23. Average Temperature and Precipitation Temperatures number of acres as follows: of25. precipitation and snowfall/sleet are shown. y�y�� ��2.417 �2.417 ��fat 0.055x; 0.055x; Find Find y�y� when when x x� � 400 400 fat fat calories. calories. should beburned done.25 25. Fat Fat Calories Calories and and Fat Fat Grams Grams The The number number ofassists of fat 13. 脂肪 脂肪熱量(以卡路里計)和飽和脂肪 24. NHL NHL Assists Assists and and Total Total Points Points The number The number of assists of Triples Triples 25are50 23 23 51 51 19 19 20 20 43 Years xschools 11734 5187 3841317 101317 743 624. Secondary Secondary schools 50 27 27 187 841842 143 143 216 216 7. 商業電影 新電影在每一家電影院的上映次 (in degrees Fahrenheit) and precipitation (in inches) are creases Per capita 19.6 19.6grams grams Per capita tax tax 1685Movie 1685 1838 1838 1734 1842 calories calories andand thethe number number of of saturated saturated fat fat grams grams forfor a a 13. Commercial Releases New movie releases per andPrecipitation the and total the total number number of points of points for a for sample a sample of NHL of NHL Fires x 72 69 58 47 84 62 57 45 61 111 140 116 88 136 重量(以公克計)如下所示。 asBuildings follows: Stories Home Home runs runs 212 212 199 199 144 144 160 160 149 122 122 8026. do the FindFind Find Find y� when when x�� xgross � 70. r 500 isrnot is not significant, significant, no149 no regression regression Contribution y,70. $Since 100 300 50 75 26. Tall Tall Buildings and and heights heights ofof buildings buildings data data559 Section 10–2 Regression random random selection selection of of breakfast breakfast entrees entrees are are shown. shown. 數與總收入。 y� when y�y� when xbe � $1500 xand $1500 in per inSince capita per capita debt. debt. studio receipts are as follows: scoring scoring leaders leaders are shown. areStories shown. should should be done. done. y 62 41 19 26 51 15 30 15 Snow/sleet 2 x 15 86 2181 83 8 89 11 8013 74 64 e? They NotAcres significant Not significant so no so regression no regression should should be be done. Find Find y� y� when when x x � � 33. 33. Since Since r risdone. isnot notsignificant, significant, nono regression regression follow: follow: Avg. daily temp. Find y� when x � 4 years. y� � 453.176 � 50.439x; 251.42 18.18. Triples Triples and and Home Home Runs RunsThe The number number of number of triples triples andand thethe Assists x 26 190 脂肪熱量 Fat Fat calories calories 19029 220 270 270 360 460 Assists 26 29 32220 32 34 34 36360 36 37 460 37 40 540 40540 should should be be done. done. 17. School 17. School Districts Districts and Secondary The The x� 60. No. of releases 361 270Schools 306 22 35 number 10 8 12 21 上映次數 y�Secondary � �31.46 �Schools 1.036x; 30.7 Find y� when xand Find y�xwhen x �54 10040 days. cient related Stories Stories x 64 64 54 40 31 31 45 45 38 38 42 42 41 41 37 37 40 40 0.2 number number of of home home runs runs obtained obtained by by a selected a selected sample sample of of Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 16. State Debt and Per Capita Taxes Data for per capita 19. Egg Production Production Number Number ofof eggs andprice price per per dozen dozen 飽和脂肪重量 of19. school ofEgg school districts districts andand the and number the number ofeggs secondary ofand secondary schools schools Sat. Sat. fat fat (g) 968 9 68 8 866 1776 17 23 2727 Total points points 48yand 48 691313 69 76 67 23 67 84 84 y�Total � �7.327 � (g) 0.175x; 10.173 in66 14. Forest Fires Acres Burned Number of fires and 22. Precipitation Snowfall/Sleet The number of days y are 總收入 Gross receipts pecific value MLB MLB players players are shown. shown. 15. Years and contribution data are as follows: state debt and per capita state tax are as follows: are are shown. shown. Heights Heights yFind yTemperature 841 841 725 725and 635 616 616 615 615 582 535 535 520511 511 485 in theindistrict the district areofshown. are shown. are as follows: y� �582 �8.994 � 520 0.1448x; 1.1485 y�400,求出 when x635 � 70�F. Average and Precipitation Temperatures number ofFind precipitation snowfall/sleet are shown. (百萬美元) (million $) acres burned 3844 1962 1371 1064 334 241 188 154 23. 125 xy� y′。 e the predicted Triples 當 � y� � �2.417 � 0.055x; Find when when x30 � 400 400 fat calories. y�fat � calories. 2.693 y� � 2.693 �y� 1.962x; ��2.417 1.962x; 62 � 620.055x; FindFind y� when y�y�when x=� 30 xx�� assists. assists. Triples 25 25153 231924 23 51 19 95 20 Years xeggs 524 5124 3 19 10 76820 64343661 Per capita debt 907 1445 1608 No. No. ofof eggs (in degrees Fahrenheit) and precipitation (in inches) are School School districts districts 53 19 19 17 17 95 68 19.6 19.6 grams grams Find Find y�y�when when x x��44. 44.61 y�y���206.399 206.399� �9.262x; 9.262x; 613.9 613.988 136 Fires x 72 69 58 47 84 62 57 45 Precipitation 116 24. NHL Assists and Total111 Points140 The number of assists 14. 醫院病床 許可床位數和工作人員床位數如 (million) (million) 1332 1332 1163 1163 1865 1865 119 119 273 273 Find y�xy,957 when x212 � 200 new releases. y� � 181.661 �122 7.319x; as26. follows: Home Home runs runs 212 199 199 144 144 160 160 149 149 122 Contribution 500 100 300 50 75 80 25. Fat 25. Calories Fat Calories and and FatStories Grams Fat Grams The number The number of fatof fat data =$957 y′。 當 200,求出 Per capita tax 1685 1838 1734 1842 1317 be said Secondary 26. Tall Tall Buildings Buildings Stories and and heights heights of of buildings buildings data Secondary schools schools 50 50 27 27 187 187 84 84 143 143 216 216 and the total number of points for a sample of NHL 27. 27. Hospital Hospital Beds Beds Licensed Licensed beds beds and and staffed staffed beds beds data data Acres y 1645.562 41$) 19 26 51 15 30 15 calories Snow/sleet 2 15 21 8 11 y� � (million calories and the and number the number of86 saturated of 81 saturated fat grams fat grams for afor a 64 13 下所示。 Price Price per per Avg. daily temp. x 83 89 80 74 Find Find y� y� when when x � x � 33. 33. Since Since r 453.176 isrin not isper not significant, significant, no 251.42 no regression regression follow: follow: Find y� when x � 4 years. y� � � 50.439x; Find y� when x � $1500 capita debt. 8. scoring leaders are shown. 校友捐款 校友捐款和畢業年數的數據如下 follow: follow: FindFind y� when y� be when xbe �done. 70. x �x Since 70.60. Since r is not r issignificant, not significant, no regression no regression should should done. random selection selection of breakfast of breakfast entrees entrees are shown. are shown. y� �0.617 �31.46 � 1.036x; 30.7 y�done. when � Not significant so 0.697 no regression should be done. dozen dozen ($) ($) 0.770 0.770 0.697 0.617 0.652 0.652 1.080 1.080 1.420 1.420 random Find when xy54 �40 100 days. should should beFind done. be Avg. mo. precip. 3.4 1.8 0.2 Stories Stories x �7.327 xy� 64 64 40 31 31 453.5 45 383.6 38 423.7 42 411.5 41100 37 37169 40 16. State Debt andDistricts Per Number Capita Taxes Data for perper capita 26 32 34 36 3740 40 所示。 19. 19. Egg Egg Production Production Number of of eggs eggs and and price price per dozen dozen x540.175x; 許可床位數 Licensed Licensed beds beds xx190 144 144 32 32 175 175 185 185 208 208 100 17. School and Secondary Schools The number y�calories �Assists � 10.173 in 29 18. Triples 18.15. Triples and and Home Home Runs Runs The number The number of triples of triples and the and the Fat calories Fat 190 220 220 270 270 360 360 460 460 540 540169 Years and contribution data are asasfollows: Find Find y� y� when when x x � � 1600 1600 million million eggs. eggs. state debt and per capita state tax are follows: are are shown. shown. Heights y841 841 725 725 635 616 616 615 615 582 535 535 520 520 511 511485 485 84 of school districts the number of secondary 23. Average Temperature and Precipitation Temperatures y� � �8.994 � 582 0.1448x; 1.1 Find y�ywhen xpoints � 70�F. Total 48 68 66141 69 76 67 y635 工作人員床位數 number number of� home of home runs runs obtained obtained by a by selected a selected sample sample of ofschools Heights y�y�� 1.252 1.252 �� 0.000398x; 0.000398x; y�and y��� 0.615 0.615 per per dozen dozen Staffed Staffed beds beds 112 32 162 162 103 103 80 80 118 118 fat (g) fat (g) 9y y 9 8112 832 13 13 17141 1723 23 27 27 xand Years xdebt 1907 51445Temperature 3Temperature 10 661 7inin 6 Sat. Sat. 畢業年數 in the district are shown. (in degrees Fahrenheit) and precipitation (in inches) are Per capita 1924 1608 20. 20. Emergency Emergency Calls Calls and Temperature Temperature No. No. of of eggs eggs MLBMLB players players are shown. are shown. Find Find y� y� when when xx x� x�� 44. 44. y�30 � y�assists. 206.399 �The 206.399 � 9.262x; � 9.262x; 613.9 � 2.693 �613.9 1.962x; 62 Find y� when x44. �Points 24. NHL Assists and Total of48.267 assists y�y��� 22.659 22.659number �y� �0.582x; 0.582x; 48.267 Find Find y�y� when � 44. follows: y957 捐款 xwhen y′。 Contribution y, $1685 500 100 3001842 50 75 8068 當 degrees degrees Fahrenheit Fahrenheit and and number number ofof emergency emergency calls calls are are (million) (million) 957 1332 1332 1163 1163 1865 1865 119 273 273 y� � �2.417 y� � �2.417 � 0.055x; � 0.055x; FindFind y�as when y� when x=�44,求出 400 x � fat 400calories. fat calories. School districts 53 19 17 95 Per capita tax 1734 Triples Triples 25 2523 231838 51 5119 1924 20 119 20 431317 43 and the total number of points for a sample of NHL 27. 27. Hospital Hospital Beds Beds Licensed Licensed beds beds and and staffed staffed beds beds 19.6 grams 19.6 grams shown. shown. 25. Fat Calories and Fat Grams number ofdata Avg. daily temp. x 33, 86do 83The89 80data 74fat 64 Find y� when x� 4 years. y�50 � 453.176 � 50.439x; 251.42 Price Price per For Exercises Exercises 28 28 through through 33, do81 aacomplete complete regression regression Secondary schools 27 187 84 143 216 For Find y�per when � per debt. scoring leaders are shown. =$1500 y′。 當xx212 4,求出 針對練習題 15 到 17,透過執行以下每一步驟完 Home Home runs runs 212 199in199 144capita 144160 160 149 149 122 122 follow: follow: 26. Tall 26. Tall Buildings Buildings Stories Stories and heights and heights of buildings of buildings data data calories and the number of saturated fat grams for0.2 a Temperature Temperature x x 68 68 74 74 82 82 88 88 93 93 99 99 101 101 Not significant so no regression should be done. dozen dozen ($)($)Debt0.770 0.770 0.697 0.6970.617 0.6170.652 0.652for 1.080 1.080 1.420 1.420 analysis analysis by performing performing these steps. Avg.by mo. precip. y these 3.4steps. 1.8 34 3.5 3.6 3.7 1.5 16. State and Per Capita Taxes Data perno capita Find y� 33. � rsignificant, is not significant, regressionfollow: Assists 26 29 32 36208 37 40169 Find Find y� when y�Districts when x學區與高中 學區的個數和它有幾所高中的 � xwhen � 33.xSecondary Since Since r 70. is not r Since issignificant, not no regression nonumber regression follow: 9. 成迴歸分析。 17. School and Schools The Licensed Licensed beds beds x x 144 144 32 32 175 175 185 185 208 100 100 169 random selection of breakfast entrees are shown. should be No. of ofdebt calls calls yxydone. 7million 7 state 44 tax 8eggs. 8are10 10 follows: 1111 99 13 13 state and per capita as Find Find y� y� when when � x� 1600 1600 million eggs. should beNo. done. be done. a.a. Draw Draw a ascatter scatter plot. plot. y� � �8.994 � 0.1448x; 1.1 Find y�64 when � ofshould school districts andHome they�number of secondary schools Total points 68 66 69141 76103 6737 84 18. Triples and Runs The number of triples and theStories Stories x Staffed x64 54 54 40 40 3170�F. 31 45 38 38 42 42 41 41 37360 40 40 y� � y�Production 1.252 � 1.252 � 0.000398x; � 0.000398x; � y� 0.615 �eggs 0.615 per per dozen dozen Staffed beds beds y xy48 112 112 32220 162 162 141 103 8080 118118 Fat calories 1903245 270 460 540 數據如下所示。 19. Egg 19.inEgg Production Number Number of eggs of and price and price per dozen per dozen y�y����7.544 �7.544 ��0.190x; 0.190x;7.656, 7.656, oror8 8calls calls Find Find y� y� when when x x � � 80�F. 80�F. a. 繪製一張散佈圖。 Per capita debt 1924 907 1445 1608 661 b. b. Compute Compute the the correlation correlation coefficient. coefficient. the district are shown. 20. 20. Emergency Emergency Calls Calls and and Temperature Temperature Temperature Temperature in in number of home runs obtained by a selected sample of are shown. are shown. 24. NHL Assists and Total Points The number of assists Heights Heights yFind 841 y 841 725 725 635 635 616 616 615 615 582 582 535 535 520 520 511 511 485 485 y� � 2.693 � 1.962x; 62 Find y� when x � 30 assists. Sat. fat (g) 9 8 13 17 23 27 y� 22.659 � 22.659 � 0.582x; � 0.582x; 48.267 48.267 y� y� when when xhypotheses. � x� 44.44.y� � c.c.Find State State the thehypotheses. Per capita tax 1685 1838 1734 1317 21. 21. Faculty Faculty and and Students The The number of of faculty and and the the degrees degrees Fahrenheit Fahrenheit and and number number of emergency emergency calls calls are are 計算相關係數。 xStudents 學區 MLB players are shown. School districts 53 19number 24of 17faculty 951842 68 andb.the total number of points for a sample of NHL No. of No.eggs ofshown. eggs Test Test the hypotheses afat aThe ��9.262x; 0.05. 0.05. Use Use Table Table I.I. � 0.055x; Find Find y�d.d. when y� when xthe � 44. xhypotheses �Fat 44.x�Grams y� 206.399 y� 400 �atat 206.399 � � 9.262x; 613.9 613.9 number number of ofstudents students inina arandom random selection selection ofofsmall small shown. y�of � �2.417 y� when 25. Fat Calories and fat Find y� when $1500 capita debt. scoring leaders are � shown. yx �1332 Triples 25 in 23187 51 19273216 20 43 c.Find 陳述假設。 高中 Secondary schools 50 27per 84 143 For For Exercises Exercises 28 28 through through 33,33, do do acalories. complete anumber complete regression regression (million) (million) 957 957 1332 1163 1163 1865 1865 119 119 273 19.6 grams colleges colleges are are shown. shown. Not significant so no regression should be done. e. e. Determine Determine the the regression regression line line equation. equation. calories and the number of beds saturated fatbeds grams for a Temperature Temperature x x 6868 7474 8282 8888 9393 9999 101 101 Hospital 27. Hospital Beds Beds Licensed Licensed and staffed and staffed beds data analysis analysis by by performing performing these steps. steps. Assists 26 29 32 34 36data37 data 40 Home 212 199 144 no 160 149 27.122 Findper y� when xxruns � 70. Since r is not significant, regression α these = d. 使用表 H breakfast 在beds 0.05 之下檢定假設。 17. School Districts and Secondary Schools The number PricePrice per Tall Buildings Stories and heights ofplot. buildings f.26. f. Plot Plot the theregression regression line line on onthe the scatter scatter plot. random selection of entrees are shown. = y′。 當 70,求出 Faculty Faculty 99 99 110 110 113 113 116 116 138 138 174 174 220 220 follow: follow: should be done. No. No. of of calls calls y y 7 7 4 4 8 8 10 10 11 11 9 9 13 13 a. a. Draw Draw a scatter a scatter plot. plot. of districts the0.617 number secondary schools dozen dozen ($) ($)school 0.770 0.697 0.697 0.652 1.080 1.420 1.420 Find y�0.770 when xand �0.617 33. Since r 0.652 isof not significant, no regression Total points 190 48 6827066 360 69 460 76 67 84 e.follow: 決定迴歸線方程式。 g.g. Summarize Summarize the theresults. results. 18. Triples and Home Runs The number of1.080 triples and the Fat calories 10. 蛋的產量 蛋的產量和每一打的價格的數據 Students Students 1353 1290 1290 1091 1213 1213 1384 1283 1283 2075 2075 should be1353 Licensed beds beds x the x144 144 32220 175 32 coefficient. 175 185 185 208 208 100 100 169540 169 in the district are shown. y� � y�1091 �7.544 � �7.544 � 0.190x; �1384 0.190x; 7.656, 7.656, or 8or calls 8 calls Licensed Find Find y� when y� when xdone. � x 1600 � 80�F. 80�F. b. b.Compute Compute the correlation correlation coefficient. Find Find y� when y� when x � 1600 x � million million eggs. eggs. number of home runs obtained by aofselected sample of dozen Stories xwhen64 54 31 452.693 38� 1.962x; 42 41 37 40 y� � 62 Find y� x 9� 30Injuries assists. f.(g) 在散佈圖上畫出迴歸線。 19. Egg Production Number eggs line and price per Sat. fat 840 13 17 23 27118 28. Fireworks Fireworks and and These These data data were were obtained obtained Now Now find the theStudents equation equation of0.615 the thenumber regression regression line when x xthe and and y y Staffed 如下所示。 c.28. c. State State hypotheses. y�21. �MLB 1.252 y�Faculty � 1.252 �find 0.000398x; � 0.000398x; y� � 0.615 y� of �The per dozen per Staffed beds beds y thethe yhypotheses. 112 112 32 Injuries 162 32 162 141 141 103 103 80 80 118 School districts 53 19 dozen 24of 17when 95 68 21. Faculty and and Students The number of faculty faculty and and the players are shown. are shown. Heights y 841 725 635 616 615 582 535 520 511 485 forg. the the years years 1993 1993 through through 1998 and andUse indicate indicate the the number number are are interchanged. interchanged. y�y��� �14.974 �14.974�� 0.111x 0.111x 20. Emergency 20. Emergency Calls Calls and and Temperature Temperature Temperature insmall in 摘要結果。 d.for d. Test the hypotheses hypotheses at22.659 a at1998 � a0.582x; � 0.05. Use Table Table I.fat I. 25. Fat Calories and Fat Grams The number number number of of students students in Temperature in a23 random a50 random selection selection of of small y�0.05. �48.267 �2.417 �of0.055x; Find y�Test when xthe �� 400 fat22.659 calories. Secondary schools 27 187 84 143 216 y� � y� � � � 0.582x; 48.267 Find Find y� when y� when x � 44. x 44. Triples 25 51 19 20 43 degrees degrees Fahrenheit number and number of emergency of emergency callscalls are are No. ofFahrenheit eggs xand 蛋的產量 19.6 grams Find y� when x� 44.ofy�saturated � line 206.399 � grams 9.262x; for 613.9 and the number fat a colleges colleges areare shown. shown. e.calories e.Determine Determine thethe regression regression line equation. equation. Find y� when 212 x �957 70.199 Since r is not significant, no regression (million) 1332144 1163 119 273 shown. shown. (百萬) Home runs 160 1865 149 122 For random selection of breakfast entrees are shown. Buildings Stories and heights of buildings data Exercises ForTall Exercises 28 through 28 through 33, do 33, a do complete a complete regression regression f.27. f. Plot Plot the the regression regression line line on on the the scatter scatter plot. plot. should be done. 10–27 10–27 Faculty Faculty 9999 110110 113113 116116 138 138 174 174 220 220 26. Hospital Beds Licensed beds and staffed beds data Temperature Temperature xand xHome 68 Since 68 74 r74 82The 88 88 93 of 93 99triples 99 101and 101the 每打價格 Price per Find y� when x y� 33. is not82 significant, no regression follow: analysis analysis by performing by performing these these steps. steps. 18. Triples Runs number calories 190 220 270 360 460 540 g.Fat g. Summarize Summarize thethe results. results. follow: Students Students 1353 1353 1290 1290 1091 1091 1213 1213 1384 1384 1283 1283 2075 2075 should be done. (美元) dozen ($)ofy home70.770 0.697 0.617 0.652 1.080 number runs obtained by a10 selected sample of1.420 No. of No. calls of calls y 7 4 4 8 10 8 11 11 9 9 13 13 Stories x 64 54 40 31 45 42 41 37 40100 a. Draw a. Draw a scatter a scatter plot. plot. Sat. fatFireworks (g) beds 8 38 13175 17 23obtained 27 169 19. Egg Production Number of and price per dozen Licensed x9Injuries 144 32These 185 208 28. 28. Fireworks and and Injuries These data data were were obtained Now Now find find the the equation equation of1600 of theeggs the regression regression line line when when x and x and y y MLB players are shown. Find y� when x � million eggs. y� � �7.544 y� � �7.544 � 0.190x; � 0.190x; 7.656, 7.656, or 8 calls or 8 calls Find Find y� when y� when x � 80�F. x � 80�F. b. Compute b. Compute the correlation the correlation coefficient. coefficient. areare shown. Heights y 841 725 635 616 615 582 535 520 511 485 for for the the years years 1993 1993 through through 1998 1998 and and indicate indicate the the number number are interchanged. interchanged. y� � y� �14.974 � �14.974 � 0.111x � 0.111x y� � 1.252 � 0.000398x; y� � 0.615 per dozen 32 162 103 80 118 y� � 141 �2.417 � 0.055x; FindStaffed y� whenbeds x �y400 fat112 calories. Triples 25 23 51 19 20 43 c. State c. State the hypotheses. the hypotheses. grams eggs 20.and Emergency Calls andnumber Temperature in 21.No. Faculty 21.ofFaculty and Students Students The number The of faculty of faculty andTemperature the and the Find19.6 y� when x � 44. y� � 206.399 � 9.262x; 613.9 492 y� � 22.659 � Table 0.582x; when at x� 44.a0.05. Home 212 199 144ofofsmall 160 149 122are d. 26. Test d. Tall Test theFind hypotheses they�hypotheses a at � 0.05. Use Table Use I. I.48.267 (million) 957Fahrenheit 1163 1865 119 273calls and number emergency number number of degrees students ofruns students in a1332 random in a random selection selection of small Buildings Stories and�heights of buildings data 10–27 27. Hospital Beds Licensed beds and staffed beds colleges colleges areshown. shown. are shown. e. follow: Determine the regression the regression line equation. line equation. data10–27 Find y� when x � 33. Since r is not significant, no regression e. Determine Price per For Exercises 28 through 33, do a complete regression follow: should be done. f. Stories Plot f. analysis Plot thex regression theby regression linethe onscatter the scatter plot.42 plot.41 37 40 x113 113 68 116 74 88174 93 99 101 Faculty Faculty 990.770 991100.697 110 116 138 82 138 174 220 220 dozen ($) Temperature 0.617 0.652 1.080 1.420 performing these 64 54line40on 31 45steps. 38 19. Egg Production Number of eggs and price per dozen Licensed beds xresults. 144 32 175 185 208 100 169 g. Summarize g. Summarize the the results. No. ofx1353 calls y1290 7eggs. 4 1384 81283 101283 11 2075 9 13 Heightsa.y Draw Find y� when � 1600 million Students Students 1353 1290 1091 1091 1213 1213 1384 2075 are shown. a scatter 841 725 635 plot. 616 615 582 535 520 511 485 y� � 1.252 � 0.000398x; y� � 0.615 per dozen Staffed beds y 112 32 162 141 103 80 118 y� � �7.544 � 0.190x; Find y�equation whenofxthe �of80�F. 28. Fireworks and and Injuries Injuries These These data data werewere obtained obtained Now Now find find equation the regression the regression line when line when x andx7.656, yand yor 8 calls28. Fireworks b. Compute the correlation coefficient. No. of the eggs

8582_ch10_533-590.qxd 82_ch10_533-590.qxd 9/13/10 9/13/102:17 2:17 PMPMPage Page 559 559


Source: National Council Fireworks Assoc. A random selection ofifSafety, states across Pyrotechnic the country, both Viewers last shows yearand x and 16.7Grades 19.1 18.9 16.0 15.8 to see Predict the number ofofinjuries 100 American million fireworks 10 television compares theAn average number 33. Absences Final educator wants Source: National Council of Fireworks Safety, American Pyrotechnic Assoc. Can ofViewers eastern and western, produced the following results. last year x 16.7 19.1 18.9 16.0 15.8 are used during a given year. viewers the show had last year with the average number of absences a student thisthe year y 18.8 25.0 for21.0 16.8in her 15.3class 29. Farm Acreage Is there a relationship between thebe concluded? Viewershow afireworks relationship between these two variables of (in millions) used and the related injuries. number of viewers this year. The data (in millions) are 32. Television television executive selects Viewers this year y Viewers 18.8 25.0 21.0 16.8obtained 15.3 affects theResearch. student’s finalA grade. The data from Source: Nielsen Media 29.Fireworks Farm Acreage a relationship between the number of farmsIsinthere a state and the acreage per farm? Predict the number of injuries if 100 million fireworks shown. Describe the relationship. 10 television shows and compares the average number No. of farms a sample are shown. inA use x 67.6 87.1 117 115 118 113 Source: Nielsen Media Research. number of selection farms in aofstate and the acreage per farm? random states across the country, both 2:18 PM33.Page 9/13/10 564 Absences and Final Grades An educator to see are lu38582_ch10_533-590.qxd used during a given year. of last viewers the had 17.85 last year20.3 withwants the average (thousands) x ofproduced 77 52 country, 20.8results. 49 Can 28 58.2 Viewers year x show 26.6 16.8 A random states across the both eastern andselection western, the following No. of absences xthis year. 12 2hermillions) 020.8 Related 33. Absences and Final Grades An10 to see 8are 5 how the number of absences for a educator student inwants class number of viewers The data (in Fireworks eastern and western, produced the following results. Can a relationship between these two variables be concluded? Acreage per farm y 347 173 173 218 246 132 injuries y 12,100 12,600 12,500 10,900 7800 7000 Viewers this year y 28.9 19.2 26.4 13.7 20.2 how the number of absences for a student in her class affects the student’s final grade. The data obtained from Final Describe grade y the relationship. 70 65 96 94 75 82 shown. in use x between 67.6 117 be115 118 113 aofrelationship these 87.1 two variables concluded? Source: World of Almanac. the student’s final grade.19.1 The data No.Source: farms aaffects sample are shown. National Council Fireworks Safety, American Pyrotechnic Assoc. 相關與迴歸 Viewers last year xyear 16.7 18.9obtained 16.0 from 15.8 20.8 Viewers last x 26.6 17.85 20.3 16.8 For Exercises 34 and 35, do a complete regression Related No. of farms aNo. sample are shown. (thousands) x 77 52 20.8 49 28 58.2 of absences x 10 12 2 0 8 5 Viewers this year y 18.8 25.0 21.0 16.8 15.3 30.Acreage SAT Scores Educational desired to find out analysis and test theysignificance of r at26.4 A � 0.05, injuries yIs there 12,100 12,600 12,500 7000 29. Farm a relationship between the Viewers this year 28.9 19.2 13.7 using 20.2 (thousands) xfarm 77 52 researchers 20.8 4910,900 28 7800 58.2 No. of absences x 10 12 2 0 8 5 Acreage per y 347 173 173 the 218 246 132 564 Chapter 10 Correlation and Regression if a relationship exists between average SAT verbal Final grade y 70 65 96 94 75 82 the P-value Source: Nielsen Mediamethod. Research. numberSource: of farms in a state and the acreage per farm? National Council Safety, American Pyrotechnic Assoc. Viewersy last year x70 16.7 19.1 9418.975 16.082 15.8 Acreage per farm y the 347of Fireworks 173 218 246 132Several World Almanac. score and SAT173 mathematical score. Final 65 96 ASource: random selection of average states across the country, both 15. 17. grade 農莊面積 州的農莊個數與每個農莊的面積 缺席與期末成績 有一位教師想要知道她班 34. Father’s anda Son’s Weights A physician 33. Absences and Final Grades An educator wants see wishes For Exercises 34 and 35, do complete regression Source: World Almanac. Viewers this year y 18.8 25.0 21.0to 16.8 15.3 states were randomly selected, and their SAT average eastern and western, produced following results. Can 29. Farm Acreage Is there the a relationship between the to know there a at relationship between a how the number ofwhether absences forof aisstudent in0.05, her class 30. SAT Scores Educational researchers desired to find out 之間有某種關係嗎?由一組跨郡的隨機樣 上學生的缺席次數如何影響期末成績。從一 For Exercises 34 and 35, do a complete regression analysis and test the significance r A � using To test the significance of b and r: scoresbetween arefarms recorded below. Isthe there sufficient evidence to Source: Nielsen Media Research. a relationship these two variables be concluded? number of in a state and acreage per farm? father’s weight pounds) newborn son’s affects the student’s final(in grade. The data obtained from 30. if SAT Scores Educational researchers desired to scores? find out a relationship exists between the average SAT verbal analysis and test the significance of r and at Ahis � 0.05, using the P-value method. a relationship between the two 本,包含西部和東部,產生以下的結果。可 組樣本得到的數據如下。 A conclude random selection of states the SAT country, both 1. Press STAT andAbsences move the(in cursor to TESTS. weight pounds). The dataAn areeducator given here. No. ofscore sample are shown. 33. and Final Grades wants to see if farms a relationship exists SAT between theacross average verbal and the average mathematical score. Several theaP-value method. 34. how Father’s and Son’s Weights physician wishes and produced the594 following results. Can589 Verbal x western, 526 504 585 2.503 Press E (ALPHA SIN) for LinRegTTest. MakeAsure the Xlist is Lclass Ylist is L2, and 以認為這兩個變數之間有某種關係嗎? (thousands) x the 77selected, 52 20.8their 49 28 58.2 theweight number absences student in her scoreeastern and average SAT mathematical score. Several states were randomly and SAT average 1, the215 缺席次數 Father’s x of10 160 for 187 210 No. of absences x xthere 12 2 aphysician 0 196 8wishes 34. Father’s and Son’s Weights A to(Use know whether is176 a relationship between a1425 205 awere relationship between these two variables be concluded? Freq is 1. F for TI-84) affects the student’s final grade. The data obtained from states randomly selected, and their SAT average scores recorded below. IsAlmanac. there evidence to 589 Math yy 522 606 588 132 517 資料來源:World Acreage perare farm 347530 173 173sufficient 218 246 toSon’s know whether there is6.6 a and relationship between father’s weight (in pounds) his 96 newborn weight yyshown. 8.2 9.2 7.1 8.8 期末成績 grade yalternative 70 65 94 son’s 75 a9.3827.4 8.6 No. ofare farms a sample are scores below. Is there evidence conclude arecorded relationship between thesufficient two scores? 3. to Select theFinal appropriate hypothesis. Source: World Almanac. father’s weight (in pounds) and his newborn son’s Source: World Almanac. weight (in pounds). The data are given here. 農莊個數 xx (thousands) 77 the 52 20.8 49 28 Move 58.2 conclude between 35. Age 35, and Worth Is12 a person’s age to theExercises cursor to(in Calculate and press ENTER. For 34 and do adata complete regression of absences xNet 10 given 2 0 related 8 5 Verbal x a relationship 526 504 594 two 585scores? 503 4.589 18.No. 年紀與財產 人們的年紀和他們的財產有關 weight pounds). The are here. Father’s weight x 176 160 187 210 196 142 205 215 31. Coal Production These data were obtained from his or her net worth? A sample of 10 billionaires is 30. SAT Scores Educational researchers desired to find out 每個農莊的面積 Acreage per farm y y 504347 594 173 585 173 218 246 132analysis and test the significance of r at A � 0.05, using Verbal x 526 503 589 Final grade 176 y 70210 96 94 75 82 Math y a sample530 522 606 588 517 589 Father’s weight 196 142 205 215 係嗎?挑選一組 10187 位億萬富翁的樣本,並且 of counties southwestern selected, person’s age65and Examplethe TI10–4 if a relationship betweeninthe average SATPennsylvania verbal Son’s weight y x and 6.6the160 8.2 9.2 7.1 8.8 net 9.3worth 7.4 are 8.6 P-value method. Source: Worldexists Almanac. Math yandAlmanac. 530the 522 606 588 517 589 Source: World indicate number (in thousands) of tons of compared. The data are given Son’sFor weight y � 0 for 8.2 9.2Example 7.1 here. 8.8 9.3 7.4 score and16. the煙煤產量 以下數據來自西南賓州的一組郡 average SAT mathematical score. Several 比較他們的年紀和財產(以十億美元計), Exercises 346.6 and 35, Is do complete regression Test the hypothesis H the data in TI 10–1. Useto8.6 a � 0.05. 35.Father’s Age Net Worth aaperson’s age related 34. and Son’s Weights A physician wishes 0: rand Source:were World Almanac. coal bituminous produced in each county and the states randomly selected, and their SAT average 30. SAT Scores Educational researchers desired to find out analysis and test the significance of r at A � 0.05, using Age x 56 39 42 60 84 37 68 66 73 55 31. Coal Production These data were obtained from 35. Age and Net Worth Is a person’s age related to 數據如下所示。執行完整的迴歸分析,並且 的隨機樣本,指出每一郡的煙煤產量(以千 his or her net worth? A sample of 10 billionaires is to know whether there is a relationship between a number of employees working in average coal production in Input the P-value method. scores are recorded below. Is there sufficient evidence to if a relationship exists between the SAT verbal Output Output 31. Coal Production These data were obtained from a sample ofcounty. counties in southwestern Pennsylvania his or weight her worth? A sample ofnewborn 10 worth billionaires selected, and person’s and net are is (in pounds) and his son’s Net worth each Predict the the amount of coalscore. produced for a father’s αnet =the p 值法檢定 r 的顯著性。 在 噸計)和每一郡煤業的員工數。如果某一郡 0.05 之下用age conclude a relationship between twoPennsylvania scores? score and the average SAT mathematical Several a sample of counties in southwestern and indicate the number (in thousands) of tons of selected, and the person’s age and net worth are compared. The $) data are given weight (in pounds). are given 34. Father’s anddata Son’s Weights A physician (billion yThe 18 14here. 12 14here. 11 10 10 wishes 7 7 5 county that has 500 selected, employees. states were randomly and SAT 資料來源:The Associated 位員工,預測它會生產多少煙煤? Verbal x 有 526 504 594 county 585oftheir 503 589 and indicate the500 number (ininthousands) tons of average bituminous coal produced each and the compared. The data are given to know whether there ishere. aPress. relationship between a Source: The Associated Press. Age x 56 39 42 60 84 37 68 66 73 55 Father’s weight x 176 160 187 210 196 142 205 215 scores are recorded below. Iscoal thereproduction sufficient evidence to No. ofof employees bituminous coal produced in each county and the number working in in father’s weight (in pounds) and his newborn son’s Math yconcludexa530 522 between 606 588 517 589 Age worth x 年紀 x 56 39 42 60 84 37 68 66 73 55 the two 員工數 employees x relationship 110 731 1031 20 118 scores? 1162 Son’sNet weight y (in pounds). 6.6 8.2 The 9.2data 7.1are8.8 9.3here. 7.4 8.6 number of employees working inofcoal production in a103 752 each county. Predict the amount coal produced for weight given Source: World Almanac. y 財產 Net worth (billion $) y 18 14 12 14 11 10 10 7 7 5 each county. Predict the amount of coal produced for a Verbal x 526 504 594 585 503 589 county that has 500 employees. 噸數 Tons y y 227 5410 5328 147 729 8095 635 6157 35. Age and Net Worth Is a person’s age 196 related to Father’s weight x 14176 210 (billion $)Associated y 18 12 160 14 187 11 10 10 7 1427 205 5 215 Source: The Press. county that has 500 employees. These522 data were MathProduction y 530 606obtained 588 from 517 589 his or her net worth? A sample of 10 billionaires is No. 31. of Coal Son’s weight y Source: The Associated Press. 6.6 8.2 9.2 7.1 8.8 9.3 7.4 8.6 counties southwestern selected, and the person’s age and worthisare No. aofsample this case, the t test value is 4.050983638. Thenet P-value 0.0154631742, which is signifi employees x ofWorld 110 731in 1031 20 118Pennsylvania 1162 103 In752 Source: Almanac. and indicate the number (in thousands) of1162 tons of compared. data are here. 35. The Age and Netgiven Worth Is�a 0.05, person’s age related to � 0.05; r � employees x 110 731 1031 20 118 103 752 The decision is to reject the null hypothesis at a since 0.0154631742 Tons y 227 5410 5328 147 729 8095 635 6157 2 bituminous in each county 31. coal Coalproduced Production These data and werethe obtained from his or her net worth? A sample of 10 billionaires 0.8966728145, r � 0.8040221364. Age x 56 39 42 60 84 37 68 66 73 55is Tons y 227 5410 5328 147 729 8095 635 6157 number employees working coal production in 技 術 步counties 驟 the 解 析 aofsample of in in southwestern Pennsylvania selected, the person’s age and are line in Y1 for graph There are two other waysand to store the equation fornet theworth regression Extending Concepts Net worth each county. Predictthe thenumber amount(in ofthousands) coal produced for of a and indicate of tons compared. The data are given here. LinReg(a�bx) 1. Type Y after the Verify y this 18result 14 command. 12 14 11 7 7 515 and 16 by using the 10 data10 in Exercises 36.bituminous Forhas Exercises 13, 15, andin21 in Section 10–1, county that 500coal employees. produced each county and thefind the 1 (billion $) Age x 56and 3910–2. 42 453.173; 60 84 regression 37 68 should 66 73not55 散佈圖 2. Type Y in the RegEQ: spot in LinRegTTest. of Sections 10–1 be done. mean of the x and y variables. Then substitute the mean1 Source: The Associated Press. the number employees working in coal production in No. of worth Extending the Concepts of the x variable into the corresponding regression line To get Y do this: 38.Net each county. Predict the amount of coal produced for a The value of the correlation coefficient can also be 1 employees x 110 731 1031Concepts 20 118 Chart 1162 Wizard 103 752 當你使用 Extending the (billion $) 18 14Y-VARS, 12 14 press 11 10 10 7 7 press 5 1 for Y . equations found Exercises 13, 15, and 21Press in的時候,產生一張散佈圖是一件直接了當的事。 thisVARS for variables, county thatStep has 500inemployees. found byyusing theto formula Step by move 1 for 1 Verify this result bycursor using the data in Exercises 15Function, and 16 36. For Exercises 13, 15, and 21 in Section 10–1, find the Tons y 227 and 5410 147 729the8095 section find5328 y�. Compare value635 of y�6157 with y for Source: The Associated Press. 1. bs 為了使用 Scatter Plot(散佈圖)選項,你必須至少有兩欄的數據。 Verify this result by the data in Exercises 15 not andbe16 No.Exercises of 36. For 13, 15, and 21 in Section 10–1, find the of Sections 10–1 and 10–2. 453.173; regression should done. mean ofeach the xexercise. and y variables. Then substitute the mean x using Generalize the results. r� of Sections and employees 110 731 1031 20 118 1162 103 752 38. 453.173; regression sy 10–2. mean the xx andinto y variables. substitute the mean of the of x variable the corresponding regression line 2.Then The value of10–1 the correlation coefficient can should also benot be done. 反白想要畫圖的數據。從 toolbar(工具列)點選 Insert(插入),然後點選 Scatter Plots 37. The y intercept value a can also be found by using the ofTons the xy variable into the corresponding regression line equations found in Exercises 13, 15, and 21 in this The value of the coefficient alsox values be by using formula 227 5410 5328 147 729 8095 635 615738. found iscorrelation the standard deviationcan of the and sy is where sx the 的第一類。 equation equations in Exercises 13, 15, 21 this a scatter plotby isusing straightforward when youy use the Chart section andfound find y�. Compare value y� in with for y Creating Step the byScatter Stepandofchart found the formula the deviation of the values. VerifyWizard. this result bsstandard x section and find y�. Compare value of y� with y for each exercise. Generalize the3.the results. r� 用滑鼠左鍵點選圖上任何一處,你會自動帶出 toolbar(工具列)上頭的 1. You must have at least two columns data to use the Scatter Plot option. a� ythe � bx Concepts rChart � �0.543; 0.812 for 18 andof20 of Section 10–1. bsExercises Extending each exercise. Generalize the results. r � syx 37. The y intercept value a can alsoTools(畫圖工具)。Chart be found by using the 2. HighlightTools(畫圖工具)選單包含三種額外編輯圖形的 the data tosbe y plotted. Select the Insert tab from the toolbar. Then select the the standard deviation of the x values is wherethis s isresult Verify using the data Exercises 15and andsy16 36. 13,value 15, and 21 also in Section 10–1, find the 37.For TheExercises y intercept a can be found by using the Scatter chart equation andxthe firstby type (Scatter withinonly markers). 選項:Design、Layout 以及 Format。 is the standard deviation of the x values and s is where s the standard deviation of the y values. Verify this result 10–28 x y of Sections 10–1 and 10–2. 453.173; regression should not be done. mean of the x and y variables. Then substitute the mean equation Extending the Concepts 3. By left-clicking anywhere the automatically bring up the Chart Tools g the standard deviation yyou values. Verify result � y � bxinto the corresponding r � this �0.543; 0.812 for Exercises 18 on and 20ofchart, ofthe Section 10–1. of the xa variable regression line 為圖形以及座標軸加標題 4. 你可以點選 Layout (title),然後從 Labels group 點 The value of the Tools correlation coefficient can also be on the38. toolbar. The Chart includes three additional tabs for editing your c �found y � bxin Exercises �0.543; 0.812 for Exercises 18 result and 20menu Section Verify this byofusing the10–1. data inr � Exercises 15 and 16 equations 13,21 15,inand 21 in 10–1, this find the 36. aFor Exercises 13, 15, and Section found by using the formula Design, Layout, and Format. 選合適的選項。 of Sections 10–1 and 10–2. 453.173; regression should not be done. sectionmean and find y�.xCompare the value of y�substitute with y for of the and y variables. Then the mean bsxyour chart and to the axes by selecting the Layout tab, then selec 10–28 4. You titles each exercise. Generalize r The � tovalue of the x variable intothe theresults. corresponding regression linecan add38. ofthe theLabels correlation coefficient can also be 10–28 sy from the appropriate option group. 相關係數 equationsvalue founda in Exercises 13, 15,byand 21 in found by using the formula 37. The y intercept can also be found using thethis where sx is the standard deviation of the x values and sy is section and find y�. Compare the value of y� with y for equation Correlation Coefficient bs Excel CORREL 函數不需要迴歸分析就可以回饋相關係數。 the standardrdeviation each exercise. Generalize the的 results. � x of the y values. Verify this result sy returns The CORREL function in Excel the correlation coefficient without regression ana � yy�intercept bx �0.543; 0.812 for Exercises 18 and 20 of Section 10–1. r � 37.aThe value a 1. can在 also be found by using the A 欄和 B 欄輸入數據。 1. Enter the data inwhere columns B. standard deviation of the x values and sy is sx is Atheand equation 2. 點選空白儲存格,接著從 Formulas。 the standard the y values. thistoolbar. result 2. Selecttoolbar(工具列)點選 a blank cell, and thendeviation select theofFormulas tab Verify from the 10–28 a � y � bx r � �0.543; 0.812 for Exercises 18 and 20 of Section 10–1. 3. 從 toolbar(工具列)點選 Function 小圖示。 3. SelectInsert Insert Function icon from the toolbar. 4. 點選 Statistical(統計類)函數,接著點選 CORREL 函數。 4. Select the Statistical function category and select the CORREL function. 10–28 5. Enter the data range A1:AN, where N is the number of sample data pairs for the first va 5. 輸入數據範圍 A1:AN,其中的 N 是出現在 Array1 第一個變數的成對樣本的個 in Array1. Enter the data range B1:BN for the second variable in Array2, and then click 數。為 Array2 的第二個變數輸入數據範圍 B1:BN,然後點選 [OK]。

10

Excel

Excel

Correlation and Regression

This procedure will allow you to calculate the Pearson product moment correlation coeffi without performing a regression analysis.

493

1. Enter the data from the example shown in a new worksheet. Enter the six values for t x numbers in column A and the corresponding y numbers in column B. 10–32


 統計學

38582_ch10_533-590.qxd

9/13/10

2:18 PM

Page 565

相關與迴歸

這一項程序讓你不用進行某一種迴歸分析就可以計算 Pearson 動差相關係數。 1. 在一張新 worksheet(工作表)輸入以下範例顯示的數據。在 A 欄輸入 x 數字 10–3 Coefficient of Determination and Standard Error of the Estimate y 數字。 的 6 個值,以及在Section B 欄輸入對應的 lu38582_ch10_533-590.qxd

9/13/10

範例 Example 2:18 PM Page

565

565

x

43

48

56

61

67

70

y

128

120

135

143

141

152

Data from the toolbar.Data(數據)。接著點選 Then select Data Analysis. Under Tools, select 2. 2. 從 Select toolbar(工具列)點選 DataAnalysis Analysis(數據分 565 Correlation. Section 10–3 Coefficient of Determination and Standard Error of the Estimate 析)。在 Analysis Tools(分析工具)下點選 Correlation(相關)。 3. In the Correlation dialog box, type A1:B6 for the Input Range and check the Grouped 3. 在 Correlation(相關)對話框,為 Input Range(輸入範圍)輸入 A1:B6,然 Example By: Columns option.

後點選 By: Column(一欄是一群)選項。 xUnderGrouped 43 48 Output56Range, and 61 type D2. 67Then click 70 [OK]. 4. Output options, select 135 143 analysis 141 152 the correlation 4.This D2。最 在y Output(輸出)選項點選 Output Range(輸出範圍),並且輸入 procedure will128 allow you120 to conduct a regression and compute coefficient. Use the data from Example 10–2. 2.後點選 Select Data from the toolbar. Then select Data Analysis. Under Analysis Tools, select [OK]。 Correlation.

1. Select the Data tab on the toolbar, then Data Analysis>Regression. 這一項程序讓你進行某一項迴歸分析並且計算相關係數。使用例題 的數據。 3. In the Correlation dialog box, type A1:B6 for the Input Range and check the 10-2 Grouped 2. By: In the Regression Columns option. dialog box, type B1:B6 in the Input Y Range and type A1:A6 in the 1. 從 Input toolbar(工具列)點選 Data(數據),然後 Data Analysis>Regression。 X Range. 4. Under Output options, select Output Range, and type D2. Then click [OK]. 2. 3. 在 Under Regression(迴歸)對話框,在 Input Range(輸入 範圍)輸入 Output options, select Output Range, andYtype D6. Then clickY[OK]. This procedure will allow you to conduct a regression analysis and compute the correlation coefficient. Useall theofdata from 10–2. B1:B6,以及在 A1:A6。 Output, expand the Input XExample Range(輸入 範圍)輸入 Note: To see the decimal places for theXstatistics in the Summary

of columns D toonL.the toolbar, Output 1.在Select the Data tab then DataRange(輸出範圍),並且輸入 Analysis>Regression. 3.width D6。最 Output(輸出)選項點選 2. In the Regression dialog box, type B1:B6 in the Input Y Range and type A1:A6 in the 1. Highlight columns D through L. 後點選 [OK]。 Input X Range.

2. Select the Home tab, and then select Format Autofit Column Width. 注意:為了清楚看到 Summary Output 的所有小數,擴大 D 欄到 3. Under Output options, select Output Range, and type D6. Then click [OK].L 欄的寬度。 To see all of the decimal places for the statistics in the Summary Output, expand the 1.Note: 反白 D 欄到 L 欄。 width of columns D to L.

2. 點選 Home tab,接著選擇 Format Autofit Column Width(自動調整欄位寬 1. Highlight columns D through L.

2.度)。 Select the Home tab, and then select Format Autofit Column Width.

10–3 10–3 494

Coefficient of Determination and Standard Error of the Estimate The previous sections stated that if the correlation coefficient is significant, the equation

Coefficient of Determination and Standard of the regression line can be determined. Also, for various values of the independent variError of the Estimate able x, the corresponding values of the dependent variable y can be predicted. Several

other measures are associated correlation and regression techniques. They include The previous sections stated that with if thethe correlation coefficient is significant, the equation of regression line can be determined. for various the independent vari-prediction thethecoefficient of determination, theAlso, standard errorvalues of theof estimate, and the able x, the But corresponding values of the dependent variable y can predicted. Several interval. before these concepts can be explained, the be different types of variation other measures are associated with the correlation and regression techniques. They include associated with the regression model must be defined. the coefficient of determination, the standard error of the estimate, and the prediction interval. But before these concepts can be explained, the different types of variation Types of Variation for the Regression Model associated with the regression model must be defined. Consider the following hypothetical regression model.


相關與迴歸

10–3

10

決定係數與估計的標準誤 10-3 Coefficient of Determination and Standard Error of the Estimate

前一節提過,如果相關係數是顯著的,可以決定迴歸線方程式。同時,針 The previous sections stated that if the correlation coefficient is significant, the equation x 的各種數值,可以預測依變數 y 的對應數值。有許多測度都和相 對獨立變數 of the regression line can be determined. Also, for various values of the independent variable x, the corresponding values of the dependent variable y can be predicted. Several 關係數與迴歸技術有關,包括決定係數、估計的標準誤以及預測區間。但是在 other measures are associated with the correlation and regression techniques. They include the coefficient of determination, the standard error of the estimate, and the prediction 解釋這一些概念之前,必須先定義和迴歸模型有關的各種變異。 interval. But before these concepts can be explained, the different types of variation associated with the regression model must be defined. lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 566

�� 迴歸模型的各種變異

Types of Variation for the Regression Model 考慮以下的假設迴歸模型。 Consider the following hypothetical regression model. x

1

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 566 566 Chapter 10 Correlation and Regression y 10

566

2

3

4

5

8

12

16

20

迴歸線的方程式是 y′ = 4.8 + 2.8x,而且 r = 0.919。y 的樣本值是 10, 8, 12, 10–33The sample y The equation of the regression line is y� � 4.8 � 2.8x, and r � 0.919. x,符號 y′ 表示預測值,透過把 x 值代入迴歸方程式求出 16, 20。針對每一個 values are 10, 8, 12, 16, and 20. The predicted values, designated by y�, for each x can be Chapter 10 Correlation and Regression found substituting each x value into the regression equation and finding y�. For examy′。比如說,當 x =by 1, ple, when x � 1, � 4.8 �of2.8x 4.8 � (2.8)(1) They�equation the � regression line is � y� 7.6 � 4.8 � 2.8x, and r � 0.919. The sample y values arefor 10,each 8, 12, and The predicted values, y�, for for each x can be Now, x, 16, there is 20. an observed y value and adesignated predicted by y� value; example, y x和一個預測值 y′;比如說,當 xand = 1finding 現在,針對每一個 的 values foundxx,有一個觀察值 by eachy� value thethat regression equation y�. are For to examwhen � substituting 1, y � 10, and � 7.6. into Recall the closer the observed the ple, when x � 1, predicted values, the better the fit is and the closer r is to �1 or �1. 時候,y = 10 以及 y′ = 7.6。回憶一下,觀察值愈接近預測值,數據與迴歸線愈 2 the sum�of7.6 the squares of the vertical distances each The total �(y y� �variation 4.8 � 2.8x �� 4.8y)�is(2.8)(1) + − 相符,而且 r point 愈接近 1 或是 1。 is from the mean. The total variation can be divided into two parts: that which is Now,toforthe each x, there isofanxobserved and a ispredicted y� value;The forvariation example, relationship and y andy value that which due to chance. y and 總變異 ∑attributed (y − y)x2�是每一個 和平均數之垂直距離的平方和。總變異分成 when 1, y � 10, y� � 7.6. Recall that the closer the observed values are to the is obtained from the relationship (i.e., from the predicted y� values) is �(y� � y)2 and predicted values, the better the fit is and the closer r is to �1 or �1. x 和 y 的關係,一部分是因為機會。因為關係得到的變 兩部分:一部分是因為 called the explained variation. Most of the variations can be explained by the relationsum the squares of the vertical distances each The )2 isorthe 2 The total closervariation the value�(y r is� toy�1 �1, theofbetter the points fit the line and the closer 異是 ∑ (y′ − y)ship. ,也叫做可解釋的變異。迴歸關係可以解釋大部分的變異。r 值 point is from the mean. 2 2The total variation can be divided into two parts: that which 2 �(y� � y) is to �(y � y) . In fact, if all points fall on the regression line, �(y� � y) willis attributed relationship of xtoand and which is 2due to chance. The variation − 1,數據與迴歸線愈相符, ∑y(y′ y)2 that 愈接近 + 1 或是 和 ∑ (y − y) 愈靠近。 y� is equal iny− each case. equal �(y �toythe )2, since 2 obtained from the relationship (i.e., from the predicted y� values) is �(y� 2 � y) and is 2 2 by �(y � y�) , is called the On the other hand, the variation due to chance, found ∑ − y) ∑ − y) y′ 事實上,如果所有點都落在迴歸線上, (y′ 會等於 (y ,因為 在 called the explained Most ofcannot the variations can betoexplained by the relationunexplained variation.variation. This variation be attributed the relationship. When ship. The closer the value r is to �1 or �1, the better the points fit the line and thefall closer y。 2 每一個 x 值都等於 the unexplained variation is small, the value of r is close to �1 or �1. If all points 2 2 on y y is to �(y � ) . In fact, if all points fall on �(y� � ) 2 the regression line, �(y� � y) will will be 0. Hence, the total variation the regression line, the unexplained variation �(y � y�) 2 ∑y(yin−each y′) case. 另一方面,因為機會帶出來的變異,是 ,叫做無法解釋的變 y)2, since is equal to � sum isequal equal�(y to the of they�explained variation and the unexplained variation. 2That is, On the other hand, the variation due to chance, found by �(y � y�) , is called the 異。這一項變異不是關係貢獻的。當無法解釋的變異很小的時候,r y)2 � �(y�This � yvariation )2 � �(y � y�)2 be attributed值會接近 �(y �variation. unexplained cannot to the relationship. When 2 + 1 或是 − 1。如果所有點都落在迴歸線上, ∑ − y′) (y 會等於 0。因此,總變 the These unexplained variation is small, the value of r is close to �1 or �1. If all points fall on values are shown in Figure 10–17. For a single point, the differences are called 2 will be 0. Hence, the total variation the regression line, the unexplained variation �(y � y�) deviations. For the hypothetical regression model given earlier, for x � 1 and y � 10, you 異等於可解釋的變異加上無法解釋的變異。也就是說, is equal sum of13.2. the explained variation and the unexplained variation. That is, y� get y� � to 7.6the and 2 2 variation is illustrated next. The procedure finding three � �(y� � ythe )2 � �(ytypes � y�)of �(y � y) for StepThese 1 Find the are predicted values showny� in values. Figure 10–17. For a single point, the differences are called 這些數值顯示在圖 10-17。針對某一點,差距叫做離異(deviation)。對 deviations. For the hypothetical given earlier, For x � 1 y� � 4.8regression � 2.8x � model 4.8 � (2.8)(1) � 7.6for x � 1 and y � 10, you get y� � 7.6 and y � 13.2. x = 1 以及 y = 10,你會得到 y ′ = 7.6 和 一開始假設的迴歸模型而言,針對 For x � 2 for finding y� � 4.8 (2.8)(2) � of 10.4 The procedure the�three types variation is illustrated next. y = 13.2。 For x � 3 y� � 4.8 � (2.8)(3) � 13.2 Step 1 Find the predicted y� values. For x � 4 y� � 4.8 � (2.8)(4) � 16.0 For x � 1 y� � 4.8 � 2.8x � 4.8 � (2.8)(1) � 7.6 495 For x � 5 y� � 4.8 � (2.8)(5) � 18.8 For x � 2 y� � 4.8 � (2.8)(2) � 10.4

Figure 10–17 Deviations for the

For y x�3

y� � 4.8 � (2.8)(3) � 13.2

For x � 4

y� � 4.8 � (2.8)(4) � 16.0

For x � 5

y� � 4.8 � (2.8)(5) � 18.8

(x, y ) Unexplained


The equation of the regression line is y� � 4.8 � 2.8x, and r � 0.919. The sample y values are 10, 8, 12, 16, and 20. The predicted values, designated by y�, for each x can be found by substituting each x value into the regression equation and finding y�. For example, when x � 1, y� � 4.8 � 2.8x � 4.8 � (2.8)(1) � 7.6

 統計學

圖 10-17 迴歸方程式的離 異

Now, for each x, there is an observed y value and a predicted y� value; for example, when x � 1, y � 10, and y� � 7.6. Recall that the closer the observed values are to the predicted values, the better the fit is and the closer r is to �1 or �1. 2 The y total variation �(y � y) is the sum of the squares of the vertical distances each (x, y) point is from the mean. The total variation can be divided into two parts: that which is attributed to the relationship of x and y and that which is due to chance. The variation 無法解釋的 2 obtained from the relationship (i.e., from the predicted y� values) 離異 y −isy ′�(y� � y) and is called the explained variation.總離異 Mosty − ofy the variations can be explained by the relationship. The closer the value r is to �1 or �1, the better (x, y ′) the points fit the line and the closer 2 可解釋的line, �(y� � y) will �(y� � y)2 is to �(y � y)2. In fact, if all points fall on the regression 2 y ′ − y 離異 equal �(y � y) , since y� is equal to y in each case. y y 2 On the other hand, the variation due to chance, (x, y ) found by �(y � y�) , is called the unexplained variation. This variation cannot be attributed to the relationship. When the unexplained variation is small, the value of r is close to �1 or �1. If all points fall on the regression line, the unexplained variation �(y � y�)2 will be 0. Hence, the total variation is equal to the sum of the explained variation and the unexplained variation. That is, �(y � y)2 � �(y� � y)2 � �(y � y�)2

x

x

These values are shown in Figure 10–17. For a single point, the differences are called deviations. For the hypothetical regression model given earlier, for x � 1 and y � 10, you 求出三種變異的程序如下所示。 get y� � 7.6 and y � 13.2. The procedure for finding the three types of variation is illustrated next. y′。 步驟Step 1 求出預測值 1 Find the predicted y� values. lu38582_ch10_533-590.qxd 9/13/10 2:18 PM 567 Page 567� lu38582_ch10_533-590.qxd 9/13/10 2:18 For � 1Page y� � 4.8 當 xPM lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 567

2.8x � 4.8 � (2.8)(1) � 7.6

For 當 x�2 For 當 x�3

y� � 4.8 � (2.8)(2) � 10.4

For 當 x�4

y� � 4.8 � (2.8)(4) � 16.0

For 當 x�5

10–3 Coefficient of Determination and Standard of the Estimate 10–3 Coefficient of Determination and Standard Error ofError the Estimate 567 567 y� � Section 4.8 �Section (2.8)(5) � 18.8

y� � 4.8 � (2.8)(3) � 13.2

Section 10–3 Coefficient of Determination and Standard Error of the Estimate

因此,這個例題的數值如下所示: the values this example as follows: Hence,Hence, the values for thisfor example are as are follows: U Hence, the values for this example are as follows: U UnusualStat Stat

Figure 10–17nusualnusual Stat

y

(x, y )

x y y y� Deviations for the There are 1,929,770, x y� are 1,929,770, x y y� RegressionThere Equation 1 10 10 There different are 1,929,770, 126,028,800 different 1 7.6 7.6 126,028,800 – 21 8 Total deviation 810 10.4y – y10.4 7.6 126,028,800 color combinations for 2 color combinations fordifferent 32 12 12 8 13.2 13.2 10.4 Rubik’s cube and only color combinations for (x, y �) 3 Rubik’s cube and only 43 16 1612 16.0 16.0 13.2 4 one correct solution in Rubik’s cube and only one correct solution in 54 20 2016 18.8 18.8 16.0 5 which all the of colors ofin one solution which all the correct colors y– 5 20 18.8 the squares oncolors each of (x, y– ) which the y 值的平均數。 求出 the squares onall each 步驟 2 Step 2 Find the mean of the y values. Step 2 Find the mean of the y values. are the same. the on each face areface the squares same. Step 2 Find the mean of the y values. face are the same. 10 � 8 � 12 � 16 � 20

Unexplained deviation y – y�

y–

10y � � 8 � 12 � 16 � 20 � 13.2� 13.2 10 �5 8 � 512 � 16 � 20 y� � 13.2 5 2 Step 3 Find the total variation �(y 2 � y) . Step 3 Find the2 total variation �(y � y) . ∑ (y − 。 the total 3 y)Find variation �(y � y)2. 步驟 3 求出總變異Step – 2 x �2 13.2) � 10.24 (10 � (10 13.2) � 10.24 2 2 � 10.24 (10 � 13.2) � 27.04 (8 � 13.2) (8 � 13.2)2 � 27.04 2 2 27.04 (8�� 13.2) �� 1.44 2 13.2) � 1.44 (12 � (12 13.2) 2 2 � 1.44 (12 � 13.2) �2 13.2) � 7.84 � 7.84 (16 � (16 13.2) 2 2 � 7.84 (16�� 13.2) � 46.24 (20 2 13.2) (20 � 13.2) � 46.24 2 2 46.24 (20 �� 13.2) �(y y) ��92.8 �(y � y)2 � 92.82 �(y � y) � 92.8 Step 4 Find the explained variation �(y�2 � y)2. Step 4 Find the explained variation �(y� � y) . 2 Step 4 Find the13.2) explained variation �(y� � y)2. (7.6 � � 31.36 (7.6 � 13.2)2 � 31.36 22 (7.6��13.2) 13.2) 31.36 �� 7.84 (10.4 (10.4 � 13.2)2 � 7.84 22 7.84 (10.4��13.2) 13.2) �� 0.00 (13.2 (13.2 � 13.2)2 � 0.00 22 0.00 (13.2 13.2) �� 7.84 (16 ��13.2) (16 � 13.2)2 � 7.84

Explained deviation y� – y–

y�

10–34

496

x

567


Step 3 Find the total variation �(y � y)2.2 Step 3 Find the total variation �(y � y) .

(10 � 13.2)2 2� 10.24 (10 � 13.2)2 � 10.24 (8 � 13.2) 2� 27.04 (8 � 13.2) � 27.04 (12 � 13.2)2 2� 1.44 (12 � 13.2) � 1.44 (16 � 13.2)2 2� 7.84 (16 � 13.2)2 � 7.84 (20 � 13.2) 2� 46.24 (20 � 13.2)2 � 46.24 �(y � y) 2� 92.8 �(y � y) � 92.8 ∑ (y′ − y)2。 步驟 4 求出可解釋的變異 Step 4 Find the explained variation �(y� � y)2.2 Step 4 Find the explained variation �(y� � y) . (7.6 � 13.2)2 2� 31.36 (7.6 � 13.2) � 31.36 (10.4 � 13.2)2 2� 7.84 (10.4 � 13.2)2 � 7.84 (13.2 � 13.2) 2� 0.00 (13.2 � 13.2)2 � 0.00 (16 � 13.2) 2� 7.84 (16 � 13.2) � 7.84 (18.8 � 13.2)2 2� 31.36 (18.8 � 13.2)2 � 31.36 �(y� � y) 2� 78.4 �(y� � y) � 78.4

相關與迴歸

10

2

Step 5 Find the variation �(y � y�) .2 ∑ (y −unexplained y′)2。 步驟 5 求出無法解釋的變異

Find the unexplained variation �(y � y�) . (10 � 7.6)2 2� 5.76 (10 � 7.6) � 5.76 (8 � 10.4)2 2� 5.76 (8 � 10.4) � 5.76 (12 � 13.2)2 2� 1.44 (12 � 13.2)2 � 1.44 (16 � 16) 2� 0.00 (16 � 16) � 0.00 (20 � 18.8)2 2� 1.44 (20 � 18.8)2 � 1.44 istorical Note �(y � y�) 2� 14.4 istorical Note �(y � y�) � 14.4 In the 19th century, Notice that In the 19th century, astronomers such as Notice that 注意到, astronomers such as Gauss and Laplace Total variation � explained variation � unexplained variation 非凡數字 Gauss and Laplace Total variation � explained variation � unexplained variation used what is called the 92.8 + � 無法解釋的變異 78.4 � 14.4 used what is called the 總變異 = 可解釋的變異 principle of least 92.8 � 78.4 � 14.4 魔 術 方 塊 總 共 有 principle of least 770,126,028,800 = + 92.8 78.4 14.4 squares based on Note: The values (y � y�) are called residuals. A residual is the1,929, difference between 種 不 同顏色的 組 squares based on Note: The values (y � y�) are called residuals. A residual is the difference measurement errors to the actual value of y and the predicted value y� for a given x value. The mean of thebetween resid合,而且只有一種 measurement errors to the actual value of y and the predicted value y� for a given x value. The mean of the residdetermine注意:數值 the shape of (y − uals always zero. As stated previously, the regression line determined by the formulas y′)is叫做殘差。殘差 y 和預測值 y′ 正確答案,那就是 (residual) 是真實觀察值 determine the shape of uals is always zero. As stated previously, the regression line determined by the formulas Earth. It is now used in in Section 10–2 is the line that best fits the points of the scatter plot. The sum of the 每一面的每一個方 Earth. It is theory. now used in in Section 10–2 is the line that best thethe points of the line scatter plot. The sum of the 之間的差距。殘差的平均數永遠是 0。如前述,第 10-2 節公式所決定的直線, regression squares of the residuals computed by fits using regression is the smallest possible 格 有smallest 著一樣的 顏 regression theory. squares of the residuals computed by using the regression line is the possible value. For this reason, a regression line is also called a least-squares色。 line. 是散佈圖上與數據最相符的直線。用迴歸線取得的殘差平方和是最小的。因為 value. For this reason, a regression line is also called a least-squares line. u38582_ch10_533-590.qxd 10/7/10 7:36 AM Page 568 這樣的理由,迴歸線也叫做最小平方線 (least-squares line)。 Step 5

H H

10–35 10–35

�� 殘差圖 568

Chapter 10 Correlation and Regression

如前述,數值 y − y′ 叫做殘差(有時候也叫做預測誤差)。這些數值可以

和 x 值畫在一張圖上,叫做殘差圖 (residual plot),可以用殘差圖來決定用迴歸 Residual Plots As previously stated, the values y � y� are called residuals (sometimes called the prediction 線預測得有多好。

errors). These values can be plotted with the x values, and the plot, called a residual plot,

can be used to determine how well the regression line can be used to make predictions. 前述例子的殘差的計算過程如下。 The residuals for the previous example are calculated as shown. x y y� y � y� � residual 殘差 1 2 3 4 5

10 8 12 16 20

7.6 10.4 13.2 16 18.8

10 � 7.6 � 2.4 8 � 10.4 � �2.4 12 � 13.2 � �1.2 16 � 16 � 0 20 � 18.8 � 1.2

The x values are plotted using the horizontal axis, and the residuals are plotted using the vertical axis. Since the mean of the residuals is always zero, a horizontal line with a y coordinate of zero is placed on the y axis as shown in Figure 10–18. Plot the x and residual values as shown in Figure 10–18. x y � y�

Figure 10–1 8

y � y�

1

2

3

4

5

2.4

�2.4

�1.2

0

1.2

497


 統計學

Residual Plots As previously stated, the values y � y� are called residuals (sometimes called the prediction errors). These values can be plotted with the x values, and the plot, called a residual plot, can be used to determine how well the regression line can be used to make predictions. The residuals for the previous example are calculated as shown. x y y� y � y� � residual 1 10 7.6 10 � 7.6 � 2.4 2 8 10.4 8 � 10.4 � �2.4 3 12 13.2 12 � 13.2 � �1.2 將 x 值畫在水平軸上,而將殘差畫在垂直軸上。因為殘差的平均數永遠是 4 16 16 16 � 16 � 0 5 18.8 20 � 18.8 � 1.2 y 20 0,所以我們會在 軸的原點處往右邊在圖上加一條水平線段,如圖 10-18 所 The x values are plotted using the horizontal axis, and the residuals are plotted using

示。 the vertical axis. Since the mean of the residuals is always zero, a horizontal line with a y xcoordinate of zero is placed on the y axis as shown in Figure 10–18. 與殘差的圖形如圖 10-18 所示。 Plot the x and residual values as shown in Figure 10–18. x y � y�

2

3

4

5

�2.4

�1.2

0

1.2

y � y� y � y�

圖 10-18 Figure 10–1 8 Residual 殘差圖 Plot

1 2.4

3 2 1

3 2 1 0

0 �1

�1 �2

�2

�3

x 1

�3 1

2

2

3

3

4

4

5

x 5

To interpret a residual plot, you need to determine if the residuals form a pattern. Figure 10–19 shows four examples of residual plots. If the residual values are more or less evenly distributed about the line, as shown in Figure 10–19(a), then the relationship 為了解釋殘差圖,你需要決定殘差是否形成某種樣式。圖 10-19 顯示四個 between x and y is linear and the regression line can be used to make predictions. This x和y 殘差圖的例子。如果殘差或多或少沿直線分佈,如圖 10-19(a)must 所示,則 means that the standard deviations of each of the dependent variables be the same for each value of the independent variable. This is called the homoscedasticity assumption. 的關係是線性的,而且此迴歸線可以用來預測。這意味著每一個已知獨立變數 See assumption 3 on page 556. Figure 10–19(b) shows that the variance of the residuals increases as the values of x 下的依變數的標準差會一致。這是所謂的變異數均質假設。詳見第 489 頁的假 increase. This means that the regression line is not suitable for predictions. 設 3。 Figure 10–19(c) shows a curvilinear relationship between the x values and the residual values; hence, the regression line is not suitable for making predictions. x 值增加而增加。這意味著此迴歸線不 圖Figure 10-19(b) 顯示殘差的變異數隨著 10–19(d) shows that as the x values increase, the residuals increase and become more dispersed. This means that the regression line is not suitable for making 適合用來預測。 predictions.

圖 10-19(c) 顯示 x 值與殘差之間有一種曲線關係;因此,此迴歸線也一樣

10–36

不適合用來進行預測。 圖 10-19(d) 顯示當 x 值漸增,殘差漸增而且愈來愈分散。這意味著此迴歸 線也一樣不適合用來進行預測。 圖 10-18 的殘差圖顯示迴歸線 y′ = 4.8 + 2.8x 有點不適合用來預測,因為 樣本很小。

498


相關與迴歸

y � y�

y � y�

0

0

10

圖 10-19 殘差圖的例子

� x

x

(a)

(b)

y � y�

y � y�

0

0

� x

(c)

x

(d)

�� 決定係數 決定係數是可解釋的變異和總變異的比值,記作 r2。也就是說,

學習目標 浺 計算決定係數。

可解釋的變異 r = 總變異 2

舉個例子,r2 = 78.4/92.8 = 0.845。r2 通常用百分比表示。所以在這個例 子,透過獨立變數迴歸線解釋了 84.5% 的總變異。 另一種取得決定 r2 值的方法是將相關係數平方。在這個例子,由 r = 0.919 得出 r2 = 0.845,這個數字和使用變異比值得到的答案一樣。 決定係數 (coefficient of determination) 是一種用迴歸線與獨立變數可以解 釋多少比例依變數變異的測度。決定係數的符號是 r 2。 當然,通常而言,把相關係數 r 取平方然後換成百分比會比較簡單,因 此,如果 r = 0.90,則 r2 = 0.81,也就是 81%。這個結果意味著解釋變數會算 到依變數 81% 的變異。其他的變異,0.19 或說是 19% 是無法解釋的。這一個 數字也叫做無決定係數,可以用 1 減去決定係數來求出這一個數字。當 r 逼近 0,r2 會掉得很快。比如說,如果 r = 0.6,則 r2 = 0.36,這意味著用獨立變數 只能解釋 36% 的依變數變異。

499


lu38582_ch10_533-590.qxd

9/13/10

2:18 PM

Page 570

 統計學 lu38582_ch10_533-590.qxd

570

9/13/10

570

2:18 PM

Page 570

Chapter 10 Correlation and Regression

無決定係數

Coefficient 1.00 of Nondetermination − r2

Chapter 10 Correlation and Regression

1.00 � r 2 學習目標 浣

Coefficient of Nondetermination

�� 估計的標準誤 1.00 � r 2

計算估計的標準 Standard Error of the Estimate Objective 6 x 值的 y′ 值,預測是一種點預測。不過,我們也可以 當預測某一個特定 誤。 Compute the standard When a y� value is predicted for a specific x value, the prediction

is a point pre

y′ estimate. 值的預測區間,就像為母體平均數的點估計建構某種信賴區間一樣。 However, a prediction interval about the y� value can be constructed, just as a con error建構 of the Standard Error of the Estimate Objective 6 interval constructed an estimate of the population mean. The prediction When a y� value is predicted for a was specific x value, thefor prediction is a point prediction. Compute the standard 預測區間使用一種統計量,叫做估計的標準誤。 error of the estimate.

uses about a statistic called estimate. However, a prediction interval the y� value canthe be standard constructed,error just asofa the confidence interval was constructed for an estimate of the population mean. The prediction interval uses a statistic called(standard the standard error of of the estimate. s denoted The standard error of the estimate, by s , isy the standard deviation of 估計的標準誤 error the estimate),記作 ,是觀察到的 est

est

observed y values about the predicted y� values. The formula for the standard error 值關於預測值 The standard errory′of的標準差。估計的標準誤公式是 thethe estimate, denoted estimate is by sest, is the standard deviation of the

observed y values about the predicted y� values. The formula for the standard error of the estimate is 2 sest �

sest �

��y � y� � 2 n�2

��y � y� � n�2

The standard error of the estimate is similar to the standard deviation, but th

估計的標準誤類似標準差,但是不使用平均數。可以從公式發現,估計的 The standard error of the estimate is similar to the standard deviation, but the mean is not As canthebestandard seen from formula, is not used. As can be seen fromused. the formula, error the of the estimate the is thestandard error of the estimat 標準誤是無法解釋的變異 也就是說,因為觀察值與期望值之差距的變異 square root of the unexplained variation—that is, square root of the unexplained variation—that is, the variation due to the difference the of variation due to the differ the observed values and the expected values—divided by n � 2. So the closer the the observed values and the expected values—divided by n � 2. So the clo 除以 n − 2 的正方根。所以,如果觀察值愈靠近預測值,估計的標準誤會 observed values are to theobserved predicted values, standard error of the valuesthe aresmaller to thethe predicted values, theestimate smaller the standard error of the e will be. 愈小。 will be. Example 10–12 shows how to compute the standard error of the estimate.

Example 10–12 shows how to compute the standard error of the estimate. 例題 10-12 顯示如何計算估計的標準誤。

例題 10-12

影印機維修費用

Example 10–12 Copy Machine Maintenance Costs 有一位研究員收集了以下的數據,而且決定影印機使用年數與每個月維護費用之間有一種 A researcher the following dataMaintenance and determines thatCosts there is a significant Example 10–12collects Copy Machine relationship between the age of a copy machine and its monthly 顯著關係。迴歸線方程式是 y′ = 55.57 + 8.13x。求出估計的標準誤。 maintenance cost. A researcher collects the following and determines that there is a signi The regression equation is y� � 55.57 � 8.13x. Find the standard error of thedata estimate. relationship between the age of a copy machine and its monthly maintenan x 機器 使用年數 每月費用 Machine Age x (years) Monthly costy y A B C D E F

■解答

The regression equation is y� � 55.57 � 8.13x. Find the standard error of the estim 1 $ 62 Monthly cost y 2 Machine 78Age x (years) 3 4 4 6

70 90 93 103

A B C D E F

Solution

1 2 3 4 4 6

Make a table, as shown. x y y� y � y� ( y � y�)2 建立如下頁所示的表格。 Solution 1 62 2 78 Step 1 Make a table, as shown. 3 70 x y y� 4 90 500 4 93 1 62 6 103

步驟 1

10–38

Step 1

2 3 4

78 70 90

$ 62 78 70 90 93 103

y � y�

(y � y�)2


The regression equation is y� � 55.57 � 8.13x. Find the standard error of the estimate. Machine Age x (years) Monthly cost y A B C D E F

1 2 3 4 4 6

$ 62 78 70 90 93 103

相關與迴歸

10

Solution Step 1

Make a table, as shown. x y y�

y � y�

(y � y�)2

1 62 2 Page 571 78 3 70 4 90 lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 571 4 93 6 103 lu38582_ch10_533-590.qxd

9/13/10

步驟 2

lu38582_ch10_533-590.qxd 10–38

9/13/10

2:18 PM

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 2:18 PM

571

Page 571

lu38582_ch10_533-590.qxd 9/13/10 2:18 y′PM Page + 571 = 55.57 利用迴歸線方程式 8.13x,為每一個 x 計算預測值 y′,並且把結果放在標示為 y′ Section 10–3 Coefficient of Determination and Standard Error of the Estimate 571 Step 2 Using the regression line equation y� � 55.57 � 8.13x, compute the predicted

的那一行。

values y� for each x and place the results in the column labeled y�.

x �the 1 regression y� � 55.57 (8.13)(1) 63.70� 8.13x, compute the predicted Using line � equation y� � 55.57 Section 10–3 x Coefficient of Determination and Standard Error of labeled the Estimate y�. 571 values and place the results in the column x � y� 2 for each y�Section � 55.57 � (8.13)(2) � 71.83 10–3 Coefficient of Determination and Standard Error of the Estimate 571 y� 55.57 � 55.57 � (8.13)(3) 79.96 x �x 1� 3 y� � � (8.13)(1) � �63.70 Step 2 Using the regression line equation y� � 55.57 � 8.13x, compute the predicted y� 55.57 � 55.57 � (8.13)(4) 88.09 x �x 2� 4 y� � � (8.13)(2) � �71.83 y�. compute the predicted values y� for each x and placeline the equation results in the Step 2 Using the regression y� �column 55.57 labeled � 8.13x, 6 y� � 55.57 � (8.13)(6) � 104.35 x �x 3� y� � 55.57 � (8.13)(3) � 79.96 values y� for each x and place the results in the column labeled y�. x�1 y� � 55.57 � (8.13)(1) � 63.70 4 eachy�y,�subtract 55.57 �y�(8.13)(4) �the88.09 Step 3x �For and place answer in the column labeled y � y�. y� ��55.57 � (8.13)(1) x � 2 x �y�1 � 55.57 (8.13)(2) � 71.83� 63.70 步驟 3 x�6 y� � 55.57 � (8.13)(6) � 104.35 63.70 � 90 � 88.09 � 1.91 � 362x� (8.13)(3) � �y� 2 � 55.57 y� �1.70 ��55.57 � y(8.13)(2) � 71.83 − y′79.96 針對每一個 y,減去x y′,並且把結果擺在標示為 的那一行。 Step 3 x For each y, subtract y� and place the answer in the column labeled y � y�. � 71.83 � 6.17 93 � 88.09 � 4.91 � 478 y� � 55.57 � (8.13)(4) � 88.09 x�3 y� � 55.57 � (8.13)(3) � 79.96 x 62 � 670 (8.13)(6) �� 104.35 � 79.96 � 104.35 �1.91 �1.35 �x63.70 �1.70 90103 � 88.09 �y� 4 ��55.57 y� �9.96 ��55.57 � (8.13)(4) �� 88.09 78 � 71.83 � 6.17 93 � 88.09 � 4.91 StepStep 3 For y, 6subtract place answer the column labeled in y� x� y�y��and 55.57 � the (8.13)(6) �in104.35 4 each Square the numbers found in step 3 and place the squares they�.column 2 . labeled (y � y�) 70 � 79.96 � �9.96 103 � 104.35 � �1.35 62 3� 63.70 � �1.70 Step For each y, subtract90 y� � and88.09 place�the 1.91 answer in the column labeled y � y�. Step 2

Step 5Square the sum of the numbers in3the column. The completed table is �Find 71.83 �63.70 6.17 93 in � step 88.09 �last 4.91 Step 4 78 numbers found and place the squares in the column 62 the � 90 � 88.09 � 1.91 2 � �1.70 shown. . labeled (y � y�) 70 � 79.96 � �9.96 103 � 104.35 � �1.35

步驟 4

78 � 71.83 � 6.17 93 � 88.09 � 2 的那一行。 4.91 2 − y′) 將步驟 3 求出的每一個數字平方,並且把答案擺在標示為 (y x the yof the y�stepin3the y �column. y�the ( y completed � Step45 Square Find sum numbers last The table is Step the numbers found in and place squares in y�) the column 70 � 79.96 � �9.96 103 � 104.35 � �1.35 2 shown. . labeled (y � y�) 步驟 5 1 62 numbers 63.70 in step �1.70 2.89 Stepx4 Square the 3 and place in the column 2 y� found y� y� �the y�)squares Step 5 Find sumy of78the numbers in the last column. The( ycompleted table is 2the 6.17 38.0689 2 71.83 求出最後一行的總和。完成的表格如下所示。 labeled (y � y�) . shown. 3 70 79.96 �9.96 99.2016 1 62 63.70 �1.70 2.89The completed table is Step the90 sum of the88.09 numbers the last( ycolumn. 1.91 x 25 4 Find y 78 y�71.83 y � in y�6.17 � 38.0689 y�)2 3.6481 4 shown. 93 88.09 4.91 24.1081 79.96 �9.96 99.2016 1 3 6 x 6270103y 63.70 �1.70 2.89 104.35 �1.35 y� y � y� (1.8225 y � y�)2 88.09 3.6481 24 7890 71.83 6.171.91 38.0689 169.7392 88.0963.70�9.964.91 24.1081 34 79.96 99.2016 1 7093 in62the �1.70 2.89 Step 466 Substitute formula and1.91 find sest . 103 104.35 �1.35 1.8225 90 88.09 3.6481 2 78 71.83 6.17 38.0689 4 88.09 79.96 4.91 �9.9624.1081 169.7392 3 93 ��y70 99.2016 � y�� 2 169.7392 6 103 104.35 �1.35 1.8225 3.6481 � � and � 6.51 s est4 Step 6 Substitute in the formula find s . 90 88.09 1.91 est A n�2 A 6�2 169.7392 24.1081 4 93 2 88.09 4.91 � � y � y� 169.7392 � Step 6 Substitute in the formula and find s . In this case, the standard deviation of observed values about the predicted 103 � 104.35 est � 6.51 �1.35 1.8225 sest �6 步驟 6 � 22 A 6�2 A isn 6.51. values 169.7392 ��y � y�� 169.7392 s est6 代入公式並且求出 。 Substitute in�the formula and � 6.51 Stepest find s . In�this case, the standard deviation of observed values about the predicted A n�2 A 6�2 est Thevalues standard error of the 2estimate can also be found by using the formula is 6.51. �y � y� �deviation In this case, the �standard of observed values about the predicted 169.7392 � 2 � � 6.51 values sisest6.51. 2 � bA A �n a��y �y �xy6 � 2 sest � error of the estimate can also be found by using the formula The standard A case, nthe�standard 2 In this 這時候,觀察值關於預測值的標準差是 6.51。 deviation of observed values about the predicted The standard error 2 of the estimate can also be found by using the formula �y � b �xy �y � values is a6.51. sest � n� A �y2 � a �y � 2b �xy sest � 501 � 2 of the estimate can also be found by using the formula TheAstandardnerror

Example 10–13 Example 10–13 Example 10–13

Find the standard error2 of the estimate for the data for Example 10–12 by using the �y � a �y � b �xy preceding formula. sest � The equation of the regression line is y� � 55.57 � 8.13x. n�2 A Find the standard error of the estimate for the data for Example 10–12 by using the Solution preceding formula. equation of for thethe regression line is y� 10–12 � 55.57 8.13x. Find the standard errorThe of the estimate data for Example by � using the


6

103

104.35

1.8225

�1.35

169.7392 Substitute in the formula and find sest.

Step 6

sest �

 統計學

��y � y�� 2 169.7392 � � 6.51 A n�2 A 6�2

In this case, the standard deviation of observed values about the predicted values is 6.51.

也可使用以下的公式求出估計的標準誤。 The standard error of the estimate can also be found by using the formula sest �

�y2 � a �y � b �xy n�2 A

例題 10-13 y′ =the +for 為例題 10-12Example 的數據用前述的公式找到估計的標準誤。迴歸線的方程式是 55.57 8.13x。 10–13 Find the standard error of the estimate for data Example 10–12 by using the

preceding formula. The equation of the regression line is y� � 55.57 � 8.13x.

■解答

Solution

步驟 1 建立表格。

lu38582_ch10_533-590.qxd

步驟 2

38582_ch10_533-590.qxd

9/13/10

2:18 PM

Page 572

Step 1

Make a table.

Step 2

Find the product of x and y values, and place the results in the third column.

Step 3 Square the y values, and place the results in the fourth column. x 和 y 的乘積,並且把結果放在第三行。 求出 9/13/10 2:18 PM Page 572

步驟 3 572

572

10–39

Chapter為 10 yCorrelation and Regression 值取平方,並且把結果放在第四行。

步驟 4

Chapter 10 Correlation and Regression Step 4 Find

the sums of the second, third, and fourth columns. The completed table is 求出第二行、第三行、第四行的總和。完成的表格如下所示。 shown here. Step 4

xFind the sums of y the second, third, xy and fourth columns. y2 The completed table is

shown here.

1 2x 3 41 42 63

62 78y 70 90 62 93 78 103 70

62 156 xy 210 36062 372 156 618 210

3,844 6,084y2 4,900 8,100 3,844 8,649 6,084 10,609 4,900

4 90 360 8,100 �y � 496 �xy � 1778 �y2 � 42,186 4 93 372 8,649 步驟 5 Step 5 From the regression 55.57, and b � 8.13. 6 103 equation y� � 55.57 618 � 8.13x, a �10,609 y′ = 55.57 + 8.13x�y aformula = solve 從方程式 Step 求出 和 b,a 55.57,b 8.13。 6 Substitute in� the496 and for s= �xy � 1778 �y2 � 42,186 est. 步驟 6

� �

2 Step 5 From the�y regression equation � a �y � b �xy y� � 55.57 � 8.13x, a � 55.57, and b � 8.13. 代入公式並且求出ssestest� 。 n�2 Step 6 Substitute in the formula and solve for sest. 42,186 � �55.57 ��496 � � �8.13��1778 � � 6.48 � �y2 � a �y � 6b � �xy 2 sest � n�2 This value is close to the value found in Example 10–12. The difference is due 42,186 � �55.57 ��496 � � �8.13 ��1778 � to rounding. � 6.48 � 6�2

� �

這個數字非常接近例題 10-12is所求出的答案,其中的差距是因為四捨五入所造成的誤差。 This value close to the value found in Example 10–12. The difference is due Prediction Interval to rounding. The standard error of the estimate can be used for constructing a prediction interval Find a prediction (similar to a confidence interval) about a y� value. interval. When a specific value x is substituted into the regression equation, the y� that you get is a point estimate for y. For example, if the regression line equation for the age of a 7 Prediction Interval Objective machine and the monthly costbeis used y� � 55.57 � 8.13x (Example 10–12), interval then The standard error of themaintenance estimate can for constructing a prediction Find a prediction the predicted maintenance cost for a 3-year-old machine would be y� � 55.57 � 8.13(3), (similar to a confidence interval) about a y� value. interval. 502 or $79.96. Since this is a point estimate, you have no idea how accurate it is. But you can When a specific value x is substituted into the regression equation, the y� that you get construct a prediction interval about the estimate. By selecting an a value, you can isachieve a point estimate for y. confidence For example, regression linethe equation for the agey of a a (1 � a) • 100% thatifthethe interval contains actual mean of the machine and the monthly maintenance cost is y� � 55.57 � 8.13x (Example 10–12), then values that correspond to the given value of x. the predicted maintenance cost for a 3-year-old machine would be y� � 55.57 � 8.13(3), The reason is that there are possible sources of prediction errors in finding the regresor $79.96. Since this is source a pointoccurs estimate, you havethe no standard idea howerror accurate is. But you sion line equation. One when finding of theitestimate sest. can Objective

7


ve 7

rediction

� n�2 42,186 � 55.57 496 � 8.13 1778 � 6.48 � � 6�2

sest �

��

��

This value is close to the value found in Example 10–12. The difference is due 相關與迴歸 to rounding.

�� 預測區間 Prediction Interval The standard error of the estimate can be used for constructing a prediction interval y′ 值的預測區間 估計的標準誤可以用來建構關於 (prediction interval)(類 (similar to a confidence interval) about a y� value. 似信賴區間)。 When a specific value x is substituted into the regression equation, the y� that you get is a point 當某一個特定的 estimate for y. For example, if the regression line equation for the age of a x 值被代入迴歸線方程式,會得到 y 的一種點預測 y′。比 machine and the monthly maintenance cost is y� � 55.57 � 8.13x (Example 10–12), then + 8.13x(例 如說,如果機器使用年數與維護費用的迴歸線方程式是 55.57 the predicted maintenance cost for a 3-year-old machine would bey′y�=� 55.57 � 8.13(3), or $79.96. Since this is a point estimate, you have no idea how accurate it is. But you can 題 10-12),則一部 3 年的機器的預測維護費用會是 y′ = 55.57 + 8.13(3) 或是 construct a prediction interval about the estimate. By selecting an a value, you can 79.96 achieve a (1美元。因為這是一種點預測,你不會知道它有多準確。但是你可以為它 � a) • 100% confidence that the interval contains the actual mean of the y values建構一種預測區間。透過選擇一個 that correspond to the given value of α x. 值,你能求出一種有 (1 − α ) · 100% 信心 The reason is that there are possible sources of prediction errors in finding the regres包含真實反應變數 y 的區間。 sion line equation. One source occurs when finding the standard error of the estimate sest. Two others理由是在發現迴歸線方程式的時候有幾種預測誤差的來源。第一種來源來 are errors made in estimating the slope and the y� intercept, since the equation of the regression line will change somewhat if different randomy′samples are used 自發現估計標準誤的時候。第二種與第三種來自估計斜率與 截距的時候, when calculating the equation.

10

學習目標 浤 求出預測區間。

這是因為如果用不同樣本會帶出些微不一樣的方程式。

針對一個 y ′ 的預測區間公式

Formula for the Prediction Interval about a Value y � y� � tA�2sest

1�

1 n�x � X � 2 � y � y� � tA�2sest � n n �x2 � � �x � 2

lu38582_ch10_533-590.qxd

9/13/10 2:18 自由度是 with d.f. � n � 2.d.f. = n − 2。

lu38582_ch10_533-590.qxd

9/13/10

PM

1�

1 n�x � X � 2 � n n �x2 � ��x � 2

Page 573

2:18 PM

Page 573

例題 10-14 Section 10–3 Coefficient of Determination and Standard Error of the Estimate

573

針對例題 10-12 的數據,求出 3 年機器每月維護費用的 95% 預測區間。 ■解答 10–14 Example 步驟 1

Section 10–3 Coefficient of Determination and Standard Error of the Estimate

For the data in Example 10–12, find the 95% prediction interval for the monthly maintenance cost of a machine that is 3 years old.

573

Example ∑ x2 和Solution X。 For the data in Example 10–12, find the 95% prediction interval for the monthly 求出 ∑ x, 10–14 a machine that is 3 years old. �x2,of and X. Step maintenance 1 Find �x, cost Solution �x � 20

步驟 2

�x2 � 82

X�

2 Find 1 y� Step Step 2 Find for�x, x ��x 3. , and X .

求出當 x = 3 的 y′。

�x � � 20 8.13x�x2 � 82 y� � 55.57 8.13(3) Step 2� 55.57 Find y��for x � 3.� 79.96

Step 3

20 � 3.3 6 X�

20 � 3.3 6

Find sy� est.� 55.57 � 8.13x � 55.57 � 8.13(3) � 79.96 sest � 6.48

步驟 3 求出 sest。

Stepas3shown FindinsestExample . 10–13. Step 4

sest �in6.48 Substitute the formula and solve: ta�2 � 2.776, d.f. � 6 � 2 � 4 for 95%.

as shown in Example 1 n�x10–13. � X� 2 y� � ta�2sest 1 � � � y � y� 2 � �xsolve: �2 n �x �and Step 4 Substitute in nthe formula ta�2 � 2.776, d.f. � 6 � 2 � 4 for 95%. 503 1 n�x � X� 2 2 1 n�x � X� � t s y� � ta�2sest 1 � � � ay�2�esty� 1 � n � n �x2 � ��x� 2 n n �x2 � ��x� 2

79.96 � �2.776 ��6.48�

1�

�2

1 6�3 � 3.3 n�x � X� 2 � � y � 79.96 1 6 6� 82 � � �20 � 2� ta�2sest 1 � n � n �x2 � ��x� 2


10–14

10–14

maintenance cost of machine that is 3 years old. maintenance cost ofSolution a machine that95% is 3 years old. interval For the data in Example 10–12, find the prediction forathe monthly Solution maintenance cost of a machine that is 3 years old. 2 Solution For the data in Example Step 10–12,1 find the�x, 95% Find �xprediction , and X . interval for the monthly Solution 2 Find �x, �x , and X. Step 1 maintenance cost of a machine that is 3 years old. 202 and X . Solution 2 1 82Find �x, �x2 � X ��x , � 3.3 . � 20 2Step Step 1 Find �x, �x , and X�x 20 63.3 2 �x � 20 �x � 82 X � � 2 20 20 Find �x, �x , and X . Step 1 Solution 6 �x � 82 X� � 3.3 22 �Find x � 3.� 3.3�x � 20 �x � 20 Step�x 82 y� for X� 6 20 6 Step Find y� for x � 3. 2 2 �x2 �x � 20 � 82 X� � 3.3 andx X StepStep 1 Find Step 2 Find y� for x � 3. 8.13x 2 �x, Find�x y� ,for �. 3. y� � 55.57 6 � y� � 55.57 �208.13x 統計學  2 Step 2 Find x� �X 55.57 � 3.3 8.13(3) y� �� 79.96 �x �y� �x3.��8.13x 82 � � 55.57 � 8.13x y�20for � 55.57 � 55.57 �68.13(3) � 79.96 � 55.57 � 8.13(3) � 79.96 y� � y� 55.57 Step 3 Find Step 2 Find for x��8.13x 3.� 8.13(3) � 55.57 � s79.96 est. Step 3 Find sest. � 55.57 � 8.13(3) � 79.96 sest � 6.48 Step 3 Find sest. 8.13x Stepy�3� 55.57 Find s� est. sest � 6.48 Step 3 Find . � 8.13(3) � as � s55.57 79.96 sest � 6.48 shown in Example 10–13. sest est � 6.48 as shown in Example 10–13. 6.48 Step 3 sFind sas est � as shown in Example 10–13.d.f. � 6 � 2 � 4 for 95%. Step 4 Substitute and solve: ta�2 � 2.776, est.shown 如例題 10-13 所示。 in Example 10–13. in the formula Step 4 Substitute in the formula and solve: ta�2 � 2.776, d.f. � 6 � 2 � 4 for 95%. shown 10–13. sasest 6.48in Example Stept 41 � Substitute the6 formula � 2� Xin n�x �d.f. Step 4 �4Substitute in the formula and solve: 2.776, � 2 � 4and for solve: 95%. ta�2 � 2.776, d.f. � 6 � 2 � 4 for 95%. 步驟 y� � ta�2sest 1 �a�2� y � y� � x � X2� 2 n 22� � � Step 4 as Substitute in the formula and solve: t � 2.776, d.f. � 6 � � 4 for 95%. n n �x �x shown in Example y� 10–13. �1ta�2sest= 1��a�2 � y � y� �x � X� 2 1 95%n的信心。 2以及針對 Xn� 2 � n = = 代入公式並且求解:t 62− 4�est α /2 n� x2.776,d.f. �x �y�2�t�x y� s 1 � � � y � y� a 2 � y� � ta�2sest 11 � n� � y � �x2 � ��x�12 � x n� n�x � X� 2 X2� 2�ta�2 � �x� � 2 2.776, d.f. � 6 � 2 �n4 forn 95%. Step 4 Substitute in the formulan and solve: �x s � � � t y� � ta�2sest 1 � � � y � y� � x � X2� 2 1 n a�2 est 2 2 � n n �x� � ��x �1ta�2sestn�x 1��X� 2 � n 2n �x � 2��x� 2 1 n�x � X� 2 1 n x � X� 2 �t�x �est n �x � n s 1 � � a 2 � y� � ta�2sest 1 � � � y � y� � ta�2sest 2 2� 2 n n �x2 � ��x� 2 � 3 n� �3.3 1 116� x n���x X � ��x� 2 n� �2 n n79.96 �x2 ����x � 2.776 �� 6.48 � y2 � 79.96 �3.3 � ta� �2sest1 �1 6�� �3 � �2 2 2 � �20 6 n6�82n� �x ��y��x 79.96 � �2.776��16.486� �3 � 1� � �2 �79.96 1 6�3 � 3.3� 2 3.36� 2��6�2.776 1� ����6.48 82 20n��2x� � X1� � � 79.96 � y � 79.96 79.96 � �2.776��6.48 � 11 � 6�� � y � 79.96 s 1 � � � t 2 6�821� � �20 2 6��32 �23.3� 2 3� 3.3� �a��2 �est � 82 �2 � �x � 26 6 20 n �x � n 79.96 � �2.776��6.48 � 1 � � � y � 79.96 � �2.776 ��6.48� 1 � 6� � 3 � 3.3 � � � � 20 � 2 6 6�82� �2�20� 2 6 6� 82 � �2.776 ��16.486� �3 � 1� 1 6�3 � 3.3� 2 3.36� 2��6�2.776 1 6�3 � 3.3� 2 � ����6.48 82 20� � 1 � � ��� � � �2.776 6.48 1 � � 79.96 � �2.776��6.48 � 79.96 1 � ��(2.776)(6.48)(1.08) � y 79.96 2 � y � 79.96 6 6�82� � �20� 2 1 � 3.3� �� � 20 � 2 � � � 20 � 2 6�82 6��3(2.776)(6.48)(1.08) 6 6� 82 79.96 � (2.776)(6.48)(1.08) � y � (2.776)(6.48)(1.08) � 2.776 �� 6.48 � � 79.96 �79.96 1 � � �79.96 19.43� �(2.776)(6.48)(1.08) y � 79.96 � � � 20 6 6�� 8219.43 �� 2y � 79.96 � (2.776)(6.48)(1.08) 79.96 � (2.776)(6.48)(1.08)79.96 � y ��79.96 ��(2.776)(6.48)(1.08) 19.43 y� 79.96 1� 19.43 6�3 � 3.3� 2 60.53 � y � 99.39 �� 6.48 � 1 �79.96 � � 19.43 � y � 79.96 � 19.43 79.96 � (2.776)(6.48)(1.08) � y ��79.96 � (2.776)(6.48)(1.08) 79.96 � 19.43 y��� 2.776 79.96 ��19.43 60.53 y � 99.39 6 6�82 � � �20� 2 60.5360.53 � y � y99.39 Hence, you can be 95% confident that the interval � 99.39 contains 79.96 � 19.4360.53 � y ��79.96 � 19.43 y95% � 99.39 Hence, you confident that the interval 60.53 � y � 99.39 contains 79.96 � (2.776)(6.48)(1.08) �can y value �be79.96 � (2.776)(6.48)(1.08) the actual of y. 60.53 � y � 99.39 bey包含真正的 95% confident <you y60.53 <can y。that the interval 60.53 � y � 99.39 contains 因此,你可以有 95% 60.53 99.39 Hence, you canthe beactual 95%的信心認為區間 confident theHence, interval � � 99.39 contains value y.that � 79.96 � 19.43 � y �of79.96 19.43 the actual value of99.39 y. contains theyou actual of 60.53 y. Hence, can value be 95% confident that the interval 60.53 � y � � y � 99.39 the actual value of y. Hence, you can be 95% confident that the interval 60.53 � y � 99.39 contains the actual value Applying of y. the Concepts 10–3

� � �

��

� � �

� � � � � � � � �

�� � ��

Applying the Concepts 10–3 10-3 Applying 觀念應用 the Concepts 10–3 Applying the Concepts 10–3 Interpreting Simple Linear Regression Applying the Concepts 10–3 Interpreting Simple Linear Regression 解讀簡單線性迴歸 Answer the questions about the followingSimple computer-generated information. Interpreting Linear Regression InterpretingAnswer Simple the Linear questionsRegression about the following computer-generated information. Linear correlation coefficient r � 0.794556 Applying the Concepts 10–3 Answer the questions about the following computer-generated information. Interpreting Answer theSimple questions Linear about the Regression following computer-generated information.

Linear correlationofcoefficient r � 0.794556 使用以下的電腦報表回答問題。 Coefficient determination � 0.631319 Linear correlation Answer the questions about the following computer-generated information.coefficient r � 0.794556 Linear correlation coefficient r � 0.794556 Coefficient of determination � Standard error of estimate 0.631319 � 12.9668 Interpreting Simple Linear Regression Coefficient Coefficient of determination � 0.631319 Standard error ofvariation estimate �5182.41 12.9668 of determination � 0.631319 Linear correlation coefficient r� 0.794556 r�= 線性相關係數 0.794556 Answer the questions about theExplained following computer-generated information. Standard error of estimate � 12.9668 Standard error ofExplained estimate � 12.9668 variation � 5182.41 Coefficient of determination � 0.631319 Unexplained variation � 3026.49 Explained variation � 5182.41 Linear correlation coefficient r � 0.794556 =�0.631319 Explained � 5182.41 決定係數 Unexplained variation � 3026.49 Standard error ofvariation estimate � 12.9668 Total variation 8208.90 Unexplained variation � 3026.49 Coefficient of determination 0.631319 Unexplained variation � 3026.49 Total variation � 8208.90 Explained variation � 5182.41 Equation of regression line y� � 0.725983X � 16.5523 Total variation � 8208.90 = 12.9668 估計的標準誤 Standard error of estimate � 12.9668 Total variation � Equation of regression line y� � 0.725983X � 16.5523 Unexplained variation �8208.90 3026.49 Level of significance � 0.1 Equation of regression line y� � 0.725983X � 16.5523 Explained variation � 5182.41 Equation � of 8208.90 regression y� � �0.794556 0.725983X � 16.5523 LevelTest ofline significance � 0.1 Total variation statistic = 5182.41 可解釋的變異 Level of significance � 0.1 Unexplained variation � 3026.49 Level significance � 0.1 Test statistic 0.794556 Equation of of regression line y� � 0.725983X � 16.5523 Critical value � 0.378419 statistic � 0.794556 Total � 8208.90 Test statistic �� 0.794556 Critical value � 0.378419 =Test Levelvariation of significance 0.1 無法解釋的變異 3026.49 Critical 0.378419 1. Are both variables moving in thevalue same�direction? Equation of regression line y� � 0.725983X � 16.5523 Critical value � 0.378419 Test statistic � 0.794556 1. � Are variables moving in the same direction? Level ofvalue significance 0.1both = 8208.90 總變異 2. Which number measures the distances from the prediction line to the actual values? Critical � 0.378419 1. Are both variables moving in the same direction? Are both moving in the same direction? 2. Which number measures the distances from the prediction line to the actual values? Test1.statistic � variables 0.794556 Which number the distances 1. Are variables moving in迴歸線方程式 the direction? y′2.=the +measures 2.both Which number measures thesame distances from prediction line to the actual values? from the prediction line to the actual values? 0.725983X 16.5523 Critical value � 0.378419 2. Which number measures the distances from the prediction line to the actual values? 1. Are both variables moving in顯著水準 the same direction? = 0.1 2. Which number measures the distances from the prediction line to the actual values?

10–41 10–41

檢定統計量 = 0.794556

10–41 10–41

臨界值 = 0.378419

10–41

1. 這兩個變數朝同一個方向改變嗎? 2. 哪一個數字測量預測線和真實數值之間的距離? 3. 哪一個數字是迴歸線的斜率? 4. 哪一個數字是迴歸線的 y 截距? 504

10–41


相關與迴歸

10

5. 可以在表格求出哪一個臨界值? 6. 哪一個數字是犯型 I 錯誤的允許風險? 7. 哪一個數字測量迴歸解釋的變異? 8. 哪一個數字測量數據點在迴歸線四周的散佈程度? 9. 虛無假設為何? 10. 為了知道是否應該拒絕虛無假設,會和臨界值比較哪一個數字? 11. 虛無假設應該被拒絕嗎? 答案在第 509 頁。

練習題 10-3 1. 可解釋的變異是什麼意思?如何計算?

9. 計算第 10-1 節練習題 7 估計的標準誤。迴歸

2. 無法解釋的變異是什麼意思?如何計算? 3. 總變異是什麼意思?如何計算?

線方程式請參考第 10-2 節練習題 7 的結果。 10. 計算第 10-1 節練習題 8 估計的標準誤。迴歸

4. 如何求出決定係數? 5. 如何求出無決定係數?

線方程式請參考第 10-2 節練習題 8 的結果。 11. 針對第 10-1 節和第 10-2 節練習題 7 以及第

針對練習題 6 到 8,求出決定係數和無決定係

10-3 節練習題 9 的數據,求出當 x = 200 的

數,並且解釋之。

90% 預測區間。

6. r = 0.75

12. 針對第 10-1 節和第 10-2 節練習題 8 以及第

7. r = 0.42 8. r = 0.91

10-3 節練習題 10 的數據,求出當 x = 4 的 90% 預測區間。

�� 結語  真實世界的變數間有太多關係了。決定是不是有某種線性關係的方法是使 用已知的統計技術:相關與迴歸。一般是使用相關係數測量線性關係的 強度與方向。它的值介於 − 1 和 + 1 之間。相關係數的值愈接近 − 1 或是 + 1,變數間的線性關係強度愈強。如果是 − 1 或是 + 1,表示一種完美的 線性關係。兩變數的正關係表示小的獨立變數會跟著小的依變數;大的獨 立變數會跟著大的依變數。兩變數的負關係表示小的獨立變數會跟著大的 依變數;大的獨立變數會跟著小的依變數。(10-1)  記住兩變數間的顯著關係不必然代表一個變數直接引起另一個變數。對某

505


 統計學

些例子而言,這是真的,但是也應該考慮其他可能性,諸如包含其他變數 (或許未知的)的複雜關係;和兩個變數交互作用的第三個變數,或是純 粹只是機會巧合。(10-1)  關係可能是線性的也可能是曲線的。為了決定形狀,可以繪製變數間的散 佈圖。如果關係是線性的,可以用一條直線近似數據。這一條直線叫做迴 歸線或是最適線。相關係數 r 愈接近 − 1 或 + 1,數據和迴歸線愈靠近。 (10-2)  殘差圖可以用來決定迴歸線是否適合用來預測。(10-3)  以決定係數指示線性關係強度會比相關係數來得好用。因為它指出依變數 的變異有多少百分比直接是因為獨立變數的變異。透過相關係數取平方並 且換成百分比而求出決定係數。(10-3)

lu38582_ch10_533-590.qxd

9/13/10

2:18 PM

Page 585

 另一個相關與迴歸會使用的統計量是估計的標準誤,它是 y 值關於 y′ 值 之標準差的估計。估計的標準誤可以用來建構某個特定 x 值的預測區間。 lu38582_ch10_533-590.qxd

9/13/10

2:18 (10-3)

PM

Page 585

重要詞彙

Important Terms

Review Exercises

influential adjusted R2 prediction 579 coefficient of determination 決定係數 lurking variable 潛伏變數 482 interval 預測區間 503point or observation 557 coefficient of 499 regression 迴歸 468 marginal change 邊際變化 489 least-squares line 567 determination 569 correlation 相關 468 regression line 迴歸線 485 multiple relationship 複關係 468 variable 534 correlation residual 殘差 negative relationship 負關係correlation 469 point or multiple relationship 535 497 lurking regression 534 547 adjusted coefficient R2 579 相關係數 473influential marginal 555 correlation coefficient 539 observation dependent variable residual plot 殘差圖 497 Pearson 557 product moment correlation negative relationship 535 regressionchange line 551 coefficient of 依變數 468 multiple correlation dependent variable 535 extrapolation 外插 489 scatter plot 散佈圖 470 coefficient Pearson 動差相關係數 least-squares line 567 determination 569 Pearson product moment residual 567 coefficient independent 468578 extrapolation 556 relationship variable 547 correlation simple coefficient 539 簡單關係 correlationvariable 534 獨立變數 468 lurking473 residual plot 568 multiple估計的 regression 575 population coefficient 母 influential pointcoefficient or observation standard independent variableerror 535of the estimate changecorrelation 555 population correlation correlation 539 影響marginal scatter plot 536 體相關係數 478 點或影響觀察值 490 標準誤 500 coefficient 543 multiple correlation dependent variable 535 simple relationship 535 positive578 relationship 正關係 468 least-squares line 最小平方線 497 coefficient positive relationship 535 extrapolation 556 standard error of the multiple regression 575 prediction interval 572 estimate 570 independent variable 535

Important Terms

585

multiple rel

negative rel

Pearson pro correlation population coefficient

positive rel

prediction i

Important Formulas

Formula for the correlation coefficient:

重要公式

Important Formulas

相關係數的公式: Formula for the correlation coefficient: r�

n(�xy) � (�x)(�y) 2[n(�x2) � (�x)2][n(�y2) � (�y)2]

Formula for the t test for the correlation coefficient:

t�r

n�2 1 � r2

d.f. � n � 2

506regression line equation: The

2[n(�x2) � (�x)2][n(�y2) � (�y)2] t 相關係數 檢定的公式: Formulafor forthe thet prediction for acoefficient: value y�: Formula test for theinterval correlation

� �

1 n(x � X)2 n�2 �� n �2 2 �y t y� �� r tA / 2 sest 2 1 �d.f. n n �x � (�x) 2 1�r n(x � X)2 The regression line equation: 1 � y� � tA/2sest 1 � � n n �x2 � (�x)2 y� � a � bx

d.f. � n � 2 where Formula(�y)(�x for the 2multiple correlation coefficient: � � (�x)(�xy) a� 2 2 2 n(�x ryx1 �)r� � 2r2yx1 � ryx2 � rx1x2 yx2 (�x) R� n(�xy) � (�x)(�y) 1 � rx21 x2 b�

y� � a � bx where a�

n(�xy) � (�x)(�y)

r�

(�y)(�x2 � � (�x)(�xy) 2

2

n(�x2) � (�x)2

Formula for the F test for the multiple correlation Formula for the standard error of the estimate: coefficient:

Formula fo

y� � t

�y

Formula fo R�

Formula fo coefficient F�

(

with d.f.N


independent variable 535

Avg. no. of Avg. one6. P Important Terms y� � a � bx Flight passengers x way fare y was Important Formulas multiplewhere numb influential point or relationship 535 regression 534 adjusted R2 579 Important Formulas Pittsburgh–Washington, DC 310 $236 lu38582_ch10_533-590.qxd lu38582_ch10_533-590.qxd 9/13/10 9/13/10 2:18PM PMthe Page Page 586 586 interval for a valuetoy�: dia 557 22:18 Formula forobservation the correlation coefficient: Formula for prediction � � (�x)(�xy) Chicago–Pittsburgh 1388 105 negative relationship 535 regression line 551 coefficient of (�y)(�x Important Formulas Formula for the correlation coefficient: Formula for the prediction interval for a value y�: a� (This

2 2 750 Cincinnati–New York 339 least-squares line 567 determination 569 n(�xCity ) � (�x) n(�xy) � (�x)(�y) Pearson product moment 567interval 1 n(x � X)2 If ther Formula for correlation coefficient: Formula for thetresidual prediction for r �the 2 a value y�:� n(�xy) � (�x)(�y) Denver–Phoenix 3019 96 y� � s 1 � � y 1 n(x � X) / A 2 est 2 2 n(�xy) � (�x)(�y) lurking correlation coefficient 539 correlation 534 r � 2 variable 2 547 2 2 n n �x � (�x) pressu residual plot 568 2[n(�x ) � (�x) ][n(�y ) � (�y) ] y� � t s 1 � � � y / b � A 2 est Denver–Los Angeles 2151 176 2 22 2 n(�xy)2 � (�x)(�y) 2 2 2 2 n n �x � (�x) ) marginal � (�x) ][n(�y (�y) ] n(�x ) � (�x)相關與迴歸 1 n(x � X) change) � 555 population correlation correlation coefficient r 2[n(�x �539 y� � tA / 2 sestscatter 11104 �plot�536 21180 n(x ��yX)2protei Houston–Philadelphia Formula for 2the t (�x) test for the correlation 2 2 2 coefficient: 22 n n �x � (�x) 2[n(�x ) � ][n(�y ) � (�y) ] � y� � t s 1 � � Formula for the standard error of the estimate: coefficient 543 1 n(x � X) A/2 est Formula for the t test for thecorrelation correlation coefficient: 2 Gram multiple dependent variable 535 www.fedstats.gov n 535n �x22 � (�x) 586 586 Source: Chapter Chapter 1010 Correlation Correlation � y�and �and tRegression ssimple 1 relationship � � A/2Regression est 2 n �for 2 the578 n 1n �x2n(x � (�x) 2 � X) coefficient positive relationship 535 Formula for the t test correlation coefficient: �(y � y�) Press d.f. � n � 2 extrapolation 556 nt � � r2 y� � tstandard 1error � d.f. �n�2 A/2sest sest � and � of�the 1 �d.f. r2 � n � 2 2. Elementary Secondary Schools t�r n� d.f. � nSchool 2 n �x2 � (�x)2 n � 2 2 multiple regression 575 prediction interval 572 1 � r estimate independent variable 535 n � 2 7. Pr M was examined for a570 random 迴歸線方程式: 或 district informationFormula for the multiple correlation coefficient: Avg. Avg. no.no. ofd.f. of Avg. one6. 6.Prot lined.f. equation: t The � r regression �n�2 �Avg. n one� 2 Formula for the multiple correlation coefficient: or 1 � equation: r2 and The regression line selection of states. The data below show the number Flight Flight passengers passengers x2 xway way fare fare y y was was con cm 2 ryx � of ryx � 2ryx1 � rcoefficient: y� � a � bx yx2 � rx1x2 is wel of elementary schools the number 2and 1 2secondary Formula for the multiple correlation 2 2 number numbe o The regression line equation: �yDC � �yr� b��xy rRyxa1�� 2r � ryx$236 � 2rx1x2 y� � a � bx Pittsburgh–Washington, Pittsburgh–Washington, DC yx310 2 310 yx1 2 $236 � Rparticular specia schools forsesteach state. Is there a significant 1 � r � x x to to diastol dias where 1 2 2 2 2 nryx � Chicago–Pittsburgh Chicago–Pittsburgh 1388 1388 其中 where y� � a � bx � r1yx �rx22r � 105 rnumber � rx1x2 � yx105 1 2 1 2 1 x2 yxthe below relationship between the Predict Important Formulas (This (This infi RFormula � variables? 2� for 750 the F1 test for the multiple correlation Cincinnati–New Cincinnati–New York York City City 750 339 339 2 � (�x)(�xy) (�y)(�x � r inthere eacis of secondary schools when the number of elementary 2 x x where(�y)(�x Formula for the F test for the multiple correlation If If there 預測區間的公式: 1 2 a � � � (�x)(�xy) coefficient: Denver–Phoenix Denver–Phoenix 3019 3019 Formula for the correlation coefficient: n(�x2) � (�x)2 Formula the prediction interval for a value y�:9696 a� relatio schools for is 300. (10–1)(10–2) coefficient: pressure pressur 2 � (�x)2 Formula for the 2151 F 2151 test for2 the multiple n(�x2) � Denver–Los Denver–Los Angeles Angeles 176 176 correlation (�y)(�x � (�x)(�xy) / k R numbp n(�xy) � (�x)(�y) a � protein protein 2 2 n(�xy) �b � (�x)(�y) coefficient: 2 2 Elementary 201 1Exercises 766 218 519 180 396 F � 148 n(x X) Houston–Philadelphia Houston–Philadelphia 1104 180 274 /� k� R1104 Review n(�xy) � (�x)(�y) )� n(�x 2 2 (�x) 2 r� specia (1 R ) / (n � k � 1) y� � t s 1 � � y � n(�x ) � (�x) F � / b � A 2 est 2 2 2 2 2 2 Grams Grams x � Rn2)�x /(n R�2/41 k(�x) � 1) 2[n(�x2) � (�x) ][n(�y ) � (�y)2] n(�x ) ��(�x) Source: Source: www.fedstats.gov www.fedstats.gov50 (1n280 Secondary 27 108 82 63 n(�xy) (�x)(�y) For ExercisesF1� through 27, do a complete regression for the standard error of the estimate: bFormula � Speci with d.f.N � n � k and d.f.D � n � k � 1. 2 2 2 Pressure Pressu (1n � )/(n �� k X) � 1) ) � (�x) n(�x Source:analysis World Almanac. Formula for the the standard error of the estimate: n(x 1R Formula for the t test for correlation coefficient: byd.f.N performing the following � � kSchools and d.f.D � nsteps. � k � 1. Elementary Secondary Schools School School 估計的標準誤公式: � y� �with tand sestSecondary 1� � Derm A/2and �(y � error y�)2 of the estimate: 2. 2.Elementary 2 22 n n �x � (�x) : Formula for the adjusted R 2 Formula for the standard 3.district Touchdowns and QB Ratings thek � 1. 7. Emerg 7.Med M district information information was was examined examined for aRrandom a2below random a. Formula Draw the scatter with d.f.N �adjusted n plot. �Listed k for and d.f.D �are n� est �� y�) n � 2s � s�(y : for the n � 2 est t�r d.f. � n � 2 number of touchdown passes thrown in the season and 2 and and mor m 2 selection selection of states. states. The The data data below below show show the the number number b. Compute the value of the correlation coefficient. d.f. � n � 2 2 n � 2 2 )(n � 1) Neuro (1 �RR 1 � r s � �(y � y�) : of NFL Formula for the adjusted 2 sample Rthe � 1R � the quarterback rating for a2adj random )(n � 1) (1 � is is well well kn or est of of elementary elementary schools schools and and the number number of of secondary secondary c. Test the significance of the correlation coefficient at 2 Pedia n�k�1 R � Formula for theaparticular correlation coefficient: or line equation: n � 2 adj �a 1significant The regression quarterbacks. Ismultiple there linear specialtie special schools schools forfor each each particular state. state. Is Is there there a2)(n significant a1relationship significant � 0.01, using Table n(1 �� kI.R� 2 � 1) Radio �y � a �y � b �xy 2 � 1 regression � 2 2variables? 2R between the (10–1)(10–2) adj below below wi s�y relationship relationship between the the variables? variables? Predict Predict the the number number d.rbetween Determine the line equation. est �� a �y � b �xy � r � 2r � r � r y� � a � bxor Foren n � k � 1 yx yx yx yx x x 1 2 1 2 1 2 n�2 sest � R � e. in in each. each of of secondary secondary schools schools when when the the number number of of elementary elementary Plot the regression line on the scatter plot. 2n�2 2 RadiaC �y � a �y � b �xy r TDs 341 �21 15 22 34 26 23 where sest � relationsh relation schools schools is is 300. 300. (10–1)(10–2) f. (10–1)(10–2) Predict y� forxa1 x2specific value of x. n�2 Source: QB rating 106for 89 82 81 96 91 86 number numbeo Formula for the F test the multiple correlation 複 習 題 (�y)(�x2 � � (�x)(�xy) Elementary Elementary 201 201 766 766 148 148 218 218 519 519 396 396 274 274 a� specialist coefficient: Source: New York Times Almanac. 8.special For E n(�x2) � (�x)2 Review Exercises Secondary Secondary 50502 280 280 2727 4141 108 108 8282 6363 (10–3 /k Accidents A study is conducted Exercises n(�xy)Review � (�x)(�y) AgeR and Specialty Specia 2.4.World 1 到 4,透過執行以下每一步驟完成 達陣次數和四分衛排名 以下數據顯示一組 should FDriver’s � Almanac. b針對練習題 � Source: Source: World Almanac. 2 2 2 and Airline age Fares The U.S. Depa For Exercises 1 through 7, do a complete regression )/(n �1.k Passengers � 1) between (1 � Rthe to determine relationship a driver’s (�x) n(�x ) � Review Exercises 9. For E Dermato Derma 1. Passengers and Airline Fares The U.S. Department For Exercises 1 through 7, do a the complete regression 迴歸分析。 NFL 隨機樣本的四分衛排名和其在球季的達 of Transportation Office of Aviation Analysis analysis by performing following steps. and the number of accidents he or she has over a 1-year 3. 3. Touchdowns Touchdowns and and QB QB Ratings Ratings Listed Listed below below are are the the estim Formula for the standard error of the estimate: Emergen Emerg of Transportation Office of Aviation Analysis analysis by performing the following steps. with d.f.N � n � 1. kare and d.f.D � nand � �the 1. Passengers Airline Fares The U.S. Departme For Exercises 1 through 7, do a complete regression provides the weekly average number of passen period. The data shown here. (This information number number ofof touchdown touchdown passes passes thrown thrown inkin the season season and and 陣次數。這兩個變數之間有顯著的關係嗎? a. Draw the scatter plot. Neurolog Neurol a. 繪製一張散佈圖。 provides the weekly average number of Analysis passengers 2 by performing the following steps. of Transportation Office of Aviation 10. E 2 random �(yanalysis � y�) per flight and the average one-way fare inFor doll a. Draw the scatter plot. will be used for Exercise 8.) If there is a significant the the quarterback quarterback rating rating for for a random a sample sample of of NFL NFL : the average one-way fare in dollars for the adjusted R and b. Compute the value of the correlationFormula coefficient. Pediatric Pediatr sest � (10-1)(10-2) per flight provides the weekly average number of passengers estim for common commercial routes. Randomly sel b. 計算相關係數。 Compute the value of the correlation coefficient. n b. � 2a. relationship, predict the number of accidents of a driver quarterbacks. quarterbacks. Is Is there there a significant a significant linear linear relationship relationship Draw the the scatter plot. c. Test significance of the correlation coefficient at for common Radiolog Radiol commercial routes. Randomly selected Times per flight and average one-way fare in dollars 資料來源:New )(n � Almanac. 1) (1 �York R2(10–1)(10–2) flights arethe listed below with the reported data. c. b. Test the significance ofofthe coefficient atisthe who 28. between between the variables? variables? (10–1)(10–2) Compute value thecorrelation correlation coefficient. a � the 0.01, using Table I. α= c. 使用表 I在 0.01之下檢定相關係數的 11. For E R2adj � 1(10–1)(10–2) � flights Forensic Forens or are listed below with the reported data. Is commercial routes. Randomly selected there of a relationship between these a �Test 0.01, using Table I.regression n for � kcommon � 1 evidence c. the significance of the correlation coefficient at d. Determine the line equation. time Radiation Radiati 達陣次數 (TD) there evidence of22 a 22 relationship between two Driver’s age x34 16 18 17 23 2723 32 these TDs TDs 34flights 21 21 1524 15 3434with 2626 23 顯著性。 are listed below the reported data. Is w variables? (10–1)(10–2) thethe regression equation. �y2 d. � aDetermine �ya � �xy � b0.01, using Table line I. line 3.34 � e. Plot regression on the scatter plot. sest � variables? (10–1)(10–2) Source: Source: WorW relationship between these two e. d. Plot the linea on theline scatter plot. No. of accidents y there 3evidence 2 815of 2 96 09191 186 QB QB rating rating 106 106 8989 8282 81a 96 861 n �Determine 2f. regression d. 決定迴歸線方程式。 equation. 12. For E Predictthe y� regression for specific value of x.四分衛排名 variables? (10–1)(10–2) f. e. Predict a specificline value Source: Source: New New York York Times Times Almanac. Almanac. 8. 8.For For Exer Ex Plot y� thefor regression on of thex.scatter plot. for pr e. 在散佈圖上畫出迴歸線。 Typing Speed and Word Processing A researcher 3.5.打字速度和文件處理 有一位研究員很想知 79 �1.y (10–3) (10–3) f. Predict y� for a specific value of x. 4. 4.Driver’s Driver’s Age and and Accidents Accidents AA study study is is conducted desires toAge know whether the typing speed ofconducted a should should be bd f. 對於特定值 x,預測 y′ 值。 13. (Opt. 道祕書的打字速度(每分鐘幾個字)和學習 to secretary to determine determine thewords the relationship relationship between between a driver’s a driver’s age 10–5 (in per minute) is related to theage 9. 9. For For Exer Ex perso Review Exercises and the新文件處理程式的時間(以小時計)是不是 the number ofof accidents accidents hehe oror she she hashas over over a 1-year a 1-year time (innumber hours) that it takes the secretary to learn to 1. 旅客數與機票價格 美國交通部的航空分析 and estimate. estimat 10 numb period. period. The data data areand are shown shown here. here. (This (This information information use aThe new word processing program. The dataDepartment are 1. 有關係?數據如下所示。 Passengers Airline Fares The U.S. For Exercises 1 through 7, do a complete regression perso 局提供每週每班飛機的平均旅客數以及經濟 10.10.For For Exer Ex will will beof be used used forfor Exercise Exercise 8.)8.) If If there is is a significant a significant shown. Transportation Office ofthere Aviation Analysis analysis by performing the following steps. 2.09x estimate. estimat 艙單程機票的平均價格。隨機挑選的班機和 relationship, relationship, predict predict the the number number of of accidents accidents of of a driver a driver weekly average number of passengers been e 速度 x xprovides Speed 48 74the52 79 83 56 85 63 88 74 90 92 a. Draw the scatter plot. who who isper is 28.28. (10–1)(10–2) (10–1)(10–2) flight and the average one-way fare in dollars 11. 11. For For Exer Ex month 它們的數據如下所示。有證據支持這兩個變 b. Compute the value of the correlation coefficient. 時間 yyfor common Time 7 4 8commercial 3.5 2 6routes. 2.3 Randomly 5 2.1 4.5selected 1.9 1.5 time time whe w Driver’s Driver’s age age x x 16 16 24 24 18 18 17 17 23 23 27 27 32 32 c. Test the significance of the correlation coefficient at 數之間有關係嗎?(10-1)(10-2) 14. (Opt. 3.34 3.34 �� y� y flights are listed below with the predict reportedthe data. Isit a � 0.01, using Table I. If there is a significant relationship, time No. No. of如果有顯著的關係,請預測打字速度每分鐘 of accidents accidents y y of a3 relationship 3 2 2 5 5 between 2 2 0 0these 1 1 two 1 1 12.12.For � rExer 資料來源:www.fedstats.gov x1xEx For there evidence 2 d. Determine the regression line equation. will take the average secretary who has a typing speed forfor press pre variables? (10–1)(10–2) 72 個字的祕書需要多少時間學習新的文件處 15. (Opt. e. Plot the regression line on the scatter 72 words per minute to learn the wordAprocessing 5. of 5.Typing Typing Speed Speed and and Word Word Processing Processing A researcher researcher 平均旅plot. 單程機票的 7979 �� y� y� 9 (10–4 f. Predict x. program. (This information will bespeed used forof 9 13.13.(Opt.) desires desires to to know know whether whether thethe typing typing speed of aExercises a 理程式?(10-1)(10-2) (Opt.) A 航班 y� for a specific value of客數 平均價格 *Answers m and 11.) (10–1)(10–2) secretary secretary (in(in words words perper minute) minute) is is related related to to thethe person’s person 4. 醫藥專家與性別 雖然每一年的女醫師愈來 Pittsburgh–Washington, DC 310 $236 time time (in(in hours) hours) that that it takes it takes thethe secretary secretary to to learn learn to to number numbeo Chicago–Pittsburgh 1388 105 useuse a new a愈多,但是在許多科別中,男醫師的數目還 new word word processing processing program. program. The The data data areare 10–53 person’s person 10–54 Cincinnati–New York City 750 339 shown. shown. 2.09x 2.09x 是高出很多。以下顯示隨機挑選的科別以及 1� 1 Denver–Phoenix 3019 96 been been emp em Speed Speed x x 48 48 74 74 52 52 79 79 83 83 56 56 85 85 63 63 88 88 74 74 90 90 92 92 從業的男醫師和女醫師人數。可以認為這兩 Denver–Los Angeles 2151 176 month. month. (1 Houston–Philadelphia 1104 180 Time Time y y 個數字之間有某種顯著的關係嗎?當女醫師 7 7 4 4 8 83.53.52 2 6 62.32.35 52.12.14.54.51.91.91.51.5 14.14.(Opt.) (Opt.) Fi If If there there is is a significant a significant relationship, relationship, predict predict thethe time time it it 0. rx1rxx21� x2 � will will take take thethe average average secretary secretary who who hashas a typing a typing speed speed 15.15.(Opt.) (Opt.) Fi ofof 7272 words words perper minute minute to to learn learn thethe word word processing processing 507 9 9 (10–4) (10–4)R program. program. (This (This information information will will bebe used used forfor Exercises Exercises *Answers *Answers may ma and and 11.) 11.) (10–1)(10–2) (10–1)(10–2)

� � �

�� � �

� � 10 � � � �

� � � � � �

� �

10–54 10–54


 統計學

有 2000 人,預測男醫師的從業人數。(10-1)

5. 針對練習題 3,求出估計的標準誤。

(10-2)

6. 針對練習題 3,求出當打字速度每分鐘 72 個

資料來源:World Almanac.

專科 皮膚科 急診醫學科 神經內科 小兒心臟科 放射科 病理學科 放射腫瘤科

女醫師人數 x 3,482 5,098 2,895 459 1,218 181 968

字時需要時間的 90% 預測區間。 男醫師人數 y 6,506 20,429 10,088 lu38582_ch10_533-590.qxd 1,241 7,574 399 3,215 588

小試身手 是非題。如果答案是「非」,請解釋之。

9/13/10

2:18 PM

Page 588

Chapter 10 Correlation and Regression

15. The sign of r and

will always be the same.

b (slope) . Line of best fit

16. The regression line is called the 填充題 17. If all the points fall on a straight line, the value of r will

1. 兩變數間的負關係意味著,在大部分時候,

be 7. x 變數叫做 變數。 or . �1, �1

8. 相關係數 r 的正負號永遠和 的正 For Exercises 18 through 21, do a complete regression 2. 即 使 相 關 係 數 很 高 ( 接 近 + 1 ) 或 是 很 低analysis. 負號一致。 (接近 − 1),它還是有可能不顯著。 如果所有數據點都落在直線上,則相關係數 a. 9. Draw the scatter plot. b. Compute the value of the correlation coefficient. 3. 顯著的相關係數不會純粹只是機會造成的。 不是 就是 。 c. Test the significance of the correlation coefficient at a � 0.05. 選擇題 10regression 到 11,透過執行以下每一步驟完 d. 針對練習題 Determine the line equation. 4. 哪一個數字顯示兩屬量變數間的線性關係強 e. 成迴歸分析。 Plot the regression line on the scatter plot. f. Predict y� for a specific value of x. 度? a. 繪製一張散佈圖。 當 x 變數遞增,y 變數也會遞增。

a. r b. a c. x d. sest 5. 相關係數 r 的顯著性檢定的自由度是

b. 計算相關係數。 18. Prescription Drug Prices A medical researcher wantsc.to 使用表 determine theαrelationship between the price = 0.05 之下檢定相關係數的 I在 per dose of prescription drugs in the United States and 顯著性。 the price of the same dose in Australia. The data are 決定迴歸線方程式。 shown.d.Describe the relationship. U.S. price xe. 在散佈圖上畫出迴歸線。 3.31 3.16 2.27 3.13 2.54 1.98 2.22

a. 1

x,預測 對於特定值 值。1.32 0.84 0.82 Australianf.price y 1.29 1.75 0.82y′0.83

b. n

10.Age 年紀和車禍 進行一項研究決定駕駛年紀與 19. and Driving Accidents A study is conducted to determine the relationship between a driver’s age 過去一年內車禍次數的關係。數據如下所 and the number of accidents he or she has over a 1-year 歲駕駛的 period.示。如果有顯著的關係,預測 The data are shown here. If there is64 a significant relationship, predict the number of accidents of a driver 車禍次數。 who is 64.

c. n − 1 d. n − 2 6. 決定係數的符號是 a. r b. r2 c. a d. b

駕駛年紀 Driver’s age x x 車禍次數 No. of accidentsyy

63 65 60 62 66 67 59 2

3

1

0

3

1

4

20. Age and Cavities A researcher desires to know if 11. 脂肪和膽固醇 追蹤一群節食者進行一項研 the age of a child is related to the number of cavities 究,看看每日消耗脂肪重量和膽固醇水準有 he or she has. The data are shown here. If there is a

significant relation for a child of 11. Age of child x No. of cavities y

21. Fat and Cho group of dieters t each consumes pe The data are show relationship, predi consumes 8.5 gram Fat grams x

6.8

Cholesterol level y

183

22. For Exercise 20, fi 1.129*

23. For Exercise 21, fi 29.5* For calculation

24. For Exercise 20, fi number of cavities

25. For Exercise 21, fi the cholesterol lev 10 grams of fat. 2

is no significant relati

26. (Opt.) A study wa relationship was fo teenager watches t hours the teenager and the teenager’s is y� � 98.7 � 3.8 weight if she avera the phone per day.

27. (Opt.) Find R whe rx1x2 � 0.625. R �

28. (Opt.) Find R2adj w

2 Radj � 0.439* *These answers may vary

508

Critical Thinking Challenges Product Sales When the points in a scatter plot show a curvilinear trend rather than a linear trend, statisticians have methods of fitting curves rather than

3. Describe how the 4. Using the log key


ion

ays be the same.

b (slope) . Line of best fit

, the value of r will

plete regression

tion coefficient. lation coefficient at

uation. atter plot. x.

6

8

9

10

12

14

No. of cavities y

2

1

3

4

6

5

21. Fat and Cholesterol A study is conducted with a group of dieters to see if the number of grams of fat each consumes per day is related to cholesterol level. The 沒有關係。如果有某種顯著關係,預測某人 data are shown here. If there is a significant relationship, predict the cholesterol level of a dieter who 每日吃 8.5 公克脂肪的膽固醇水準。 consumes 8.5 grams of fat per day. Fat脂肪重量 grams x x

6.8 5.5 8.2

膽固醇 Cholesterol 水準 level y y

183 201 193 283 222 250 190 218

8.6 9.1 8.6 10.4

10

12. 針對練習題 11,求出估計的標準誤。 13. 針對練習題 11,求出吃了 10 公克脂肪的某 人,他的膽固醇水準的 95% 預測區間。

1.129*

29.5* For calculation purposes only. No regression should be done.

24. For Exercise 20, find the 90% prediction interval of the number of cavities for a 7-year-old. 0 � y � 5* 觀念應用的答案

A study is conducted en a driver’s age e has over a 1-year here is a significant ccidents of a driver 62 66 67 59 1

4

y when r � 0.561 and r � 0.714 and 27. (Opt.) Find R yx1 yx2 400 rx1x2 � 0.625. R � 0.729*

28. (Opt.) 300 Find R2adj when R � 0.774, n � 8, and k � 2. 煞車距離

r desires to know if umber of cavities re. If there is a

25. For Exercise 21, find the 95% prediction interval of the cholesterol level of a person who consumes 觀念應用 10-1 煞車距離 10 grams of fat. 217.5 (average of y� values is used since there is1.no獨立變數是每小時幾英哩(MPH)。 significant relationship) 26. (Opt.) A study was conducted, and a significant 2. 依變數是煞車距離(英呎)。 relationship was found among the number of hours a 3. 獨立變數每小時幾英哩是連續的屬量變數。 teenager watches television per day x1, the number of 4. 依變數煞車距離是連續的屬量變數。 hours the teenager talks on the telephone per day x2, and the teenager’s weight y. The regression equation 5. 散佈圖如下所示。 is y� � 98.7 � 3.82x1 � 6.51x2. Predict a teenager’s weight if she averages 3 hours of TV and 1.5 hours on Scatter plot of braking vs. mph 煞車距離對 MPHdistance 的散佈圖 the phone per day. 119.9*

Braking distance

83 1.32 0.84 0.82

2 Radj � 0.439* *These answers 200may vary due to the method of calculation or rounding.

x

0

a scatter plot a linear trend, es rather than better fit and a ve that can be used data shown are the ver a period of sales rise during ater on. 8

10

12

15

19

20

21

21

20

30

40

3. Describe how the line fits

60 50 the data. (MPH) mph

70

80

4. Using the log key on your calculator, transform the x 6. 兩變數之間可能有某種線性關係,但是這一 values into log x values. 組數據似乎有曲線關係。 5. Using the log x values instead of the x values, find the equation of a and b for the regression line. 7. 改變獨立變數數字間的距離,會改變關係看 6. Next, plot the curve y � a � b log x on the graph. 起來的樣子。 7. Compare the line y � a � bx with the curve y � a � 8. 兩變數間的關係是正的 速度愈高,煞車 b log x and decide which one fits the data better. 距離愈長。 8. Compute r, using the x and y values; then compute r, 9. 兩變數間的強烈關係建議我們可以用 using the log x and y values. Which is higher? MPH 準確預測煞車距離。不過,我們還是得關心 9. In your opinion, which (the line or the logarithmic curve) would be a better predictor for the data? Why? 數據呈現出來的曲線關係。

10. 答案僅供參考。影響煞車距離的變數可能有 道路狀況、駕駛反應時間以及煞車的堪用狀 況。

11. 相關係數是 r = 0.966。 12. 相關係數 r = 0.966 在 α = 0.05 之下是顯著 的。這和兩變數間的強烈正關係是一致的。

觀念應用 10-2 再次探討剎車距離 1. 迴歸線方程式是 y′ = − 151.90 + 6.4514x。 2. 由迴歸線斜率得知,每增加 1 MPH,平均而 言煞車距離就需增加 6.45 英呎。y 截距是車 速等於 0 MPH 時的煞車距離,語意上這是無 意義的,但是 y 截距卻是模型的重要部分。 3. y ′ = − 1 5 1 . 9 0 + 6 . 4 5 1 4 ( 4 5 ) = 1 3 8 . 4 。 當 MPH = 45 的時候,煞車距離大概是 138 英 呎。

100

allenges

ne.

10

相關與迴歸

23. For Exercise 21, find the standard error of the estimate.

13 2.54 1.98 2.22

3

Age of child x

22. For Exercise 20, find the standard error of the estimate.

medical researcher between the price United States and ia. The data are

0

significant relationship, predict the number of cavities for a child of 11.

4. y′ = − 151.90 + 6.4514(100) = 493 .2。當 MPH = 100 的時候,煞車距離大概是 493 英 呎。 5. 在 MPH 數據的範圍外預測煞車距離是不恰 當的(比如說,超過 100 MPH),因為我們 並不知道數據範圍外,兩變數之間的關係。

觀念應用 10-3 解讀簡單線性迴歸 1. 是。這兩個變數朝同一個方向改變。換言 之,兩變數間是正向關聯的。 2. 無法解釋的變異 3026.49 測量預測線和真實 數值之間的距離。 3. 迴歸線的斜率是 0.725983。 4. 迴歸線的 y 截距是 16.5523。 5. 可以在表格求出臨界值 0.378419。 6. 犯型 I 錯誤的允許風險是 0.10,即顯著水準。

509


 統計學

7. 迴歸可解釋的變異是 0.631319,或者大概是 63.1%。 8. 數據點在迴歸線四周的散佈程度是 12.9668, 估計的標準誤。 9. 虛無假設是無相關,H0:ρ = 0。

510

10. 我們會比較檢定數值 0.794556 和臨界值,決 定是否應該拒絕虛無假設。 11. 因為 0.794556 > 0.378419,我們拒絕虛無假 設,並且發現有足夠的證據支持相關係數不 等於 0。


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.