MATH 11 SEMI-STUDY GUIDE
Professor: Jason Schweinsberg Supplemental Instruction: Juan Djuwadi Quarter: Winter 2017
Measures of Center The Median: The middle value with half the values below and half the values above. It is the middle value that divides the histogram into two equal areas.
Ă&#x; Both sides have equal areas
Finding the Median: ALWAYS, put the values in order first. Denote n as the number of values there are: N is odd: Median is the center value N is even: Median is
đ?’?!đ?&#x;? đ?&#x;?
Median is good for summarizing the center of the distribution, even when the shape is skewed or with the presence of outliers. They are statistically ROBUST – resistant to outliers. The Mean: Another measure of center. It is the average of all the values together. Finding the Mean: Denote y as each individual observation and n as the number of observations:
đ?’š=
đ?’š đ?’?
To find the mean (often denoted as the “variable-bar�), add up all the values and divide the sum by the number of data points there are. The mean is the balancing point of the histogram:
Measures of Spread The center alone won’t tell us that much information. Spread can give us an even clearer picture of the patterns within the data. Spread is always reported in reference to the center of the distribution, so this information is usually presented hand-in-hand. The Range: Difference between the maximum and minimum values. đ?‘…đ?‘Žđ?‘›đ?‘”đ?‘’ = đ?‘€đ?‘Žđ?‘Ľ đ?‘‰đ?‘Žđ?‘™đ?‘˘đ?‘’ − đ?‘€đ?‘–đ?‘› đ?‘‰đ?‘Žđ?‘™đ?‘˘đ?‘’ But this means that a single extreme outlier will greatly affect the result, so it’s not robust. The Interquartile Range: Knowing this, it might be better to ignore the end values, and focus on the spread within certain boundaries. IQR shows the range of the middle half. Divide the observations at the center by finding the median, and then find the median again of each ends of the half. This should give you four quarters. Lower Quartile: 25th Percentile Median: 50th Percentile Upper Quartile: 75th Percentile. đ??źđ?‘„đ?‘… = đ?‘˘đ?‘?đ?‘?đ?‘’đ?‘&#x; đ?‘žđ?‘˘đ?‘Žđ?‘&#x;đ?‘Ąđ?‘–đ?‘™đ?‘’ − đ?‘™đ?‘œđ?‘¤đ?‘’đ?‘&#x; đ?‘žđ?‘˘đ?‘Žđ?‘&#x;đ?‘Ąđ?‘–đ?‘™đ?‘’ IQR is always a reasonable summary of spread, even with skewed distributions or presence of outliers. The Standard Deviation: But sometimes we want to know how individual data points ‘perform’ against others. IQR won’t be able to tell you how a single point varies, so the standard deviation is useful in many other ways. Finding the Standard Deviation: đ?‘†đ?‘Ąđ?‘Žđ?‘›đ?‘‘đ?‘Žđ?‘&#x;đ?‘‘ đ??ˇđ?‘’đ?‘Łđ?‘–đ?‘Žđ?‘Ąđ?‘–đ?‘œđ?‘› = đ?‘‰đ?‘Žđ?‘&#x;đ?‘–đ?‘Žđ?‘›đ?‘?đ?‘’ đ?‘‰đ?‘Žđ?‘&#x;đ?‘–đ?‘Žđ?‘›đ?‘?đ?‘’ =
(!!!)! !!!
đ?‘†đ?‘Ąđ?‘Žđ?‘›đ?‘‘đ?‘Žđ?‘&#x;đ?‘‘ đ??ˇđ?‘’đ?‘Łđ?‘–đ?‘Žđ?‘Ąđ?‘–đ?‘œđ?‘› =
(đ?‘Ś − đ?‘Ś)! đ?‘›âˆ’1
The standard deviation measures how far away (on average) points deviate from the mean.
Why do we Square? Some points deviate positively and negatively, if we didn’t square, those deviations will cancel each other and the average deviations become zero. Why square root the Variance to find Standard Deviation? It’s really just a matter of units. When we square the deviations of each individual point from its mean we are also squaring its units. We have to take the square root of the final result to make the units consistent with what we are trying to observe. Like the mean, standard deviation is only appropriate for symmetric data. They are not statistically robust. Skewed Distributions: Distribution Skewed Left: Think in terms of tails. A distribution skewed left has a longer tail on the left. Its skewed quality pulls the mean to the side of Distribution Skewed Right: the tail. Same with distributions skewed right. This is because the mean isn’t statistically robust. Relatively extreme values will drastically change the mean. Symmetric Distribution:
True or False Questions: 1) “The outlier has huge effects on the mean, therefore the IQR and the variance is greatly affected as well.” 2) “The mean is larger than the median for distributions skewed to the right.” 3) “Knowing that a distribution is skewed left, it’s much better to use the mean and standard deviation to figure out center and spread.” 4) “A distribution skewed left, implies that the value of the upper quartiles are further away from the median than the values of the lower quartile.”
Standardizing using Z-Scores Standardizing means expressing a point/value in the amount of standard deviations it is from the mean. Why? What’s the point? We do this for many different reasons, but one of them is to compare how “remarkable/impressiveâ€? a statistic/phenomenon might be. The more standard deviations away from the mean something is, the more “WOW that’s crazyâ€? it is. A good example might be if we take a look at scoring between basketball and soccer/football: If I told you the Los Angeles Lakers scored 100 points in their game yesterday, you might be like: â€?uhm, okayâ€?. But if I told you F.C. Barcelona scored 100 points in their game yesterday, you’d be like: “WOW!â€? The numbers alone don’t tell you anything without the context of mean and standard deviation. To find Z-Score: đ?’šâˆ’đ?’š đ?’›= đ?’” Since the deviation is divided by the standard deviation, the units cancel. In other words, Z-scores don’t have units. But Z-scores can be negative or positive depending on the point’s position from the mean. Changes of Units Adding or Subtracting a Constant: This causes the distribution to shift right or left. This means measures of position such as (center, percentiles, min, max) will change by the amount of the constant, BUT measures of spread will stay the same (IQR, Standard Deviation). Ă&#x; We see the entire distribution shifts by adding the constant but the spread of the distribution remains the same.
A real-life example could be changing your data points from Celsius to Kelvins – we add 273.15 to every data point. The mean and median will change by 273.15 but the IQR and the Standard Deviation remain the same. Multiply or Divide by a Constant: But most changes in units involve changes in scale instead of simple addition/subtractions. Changes in scale involve multiplying or dividing a constant to every single point in our data. Multiplying the data by a constant amplifies the spread of the distribution: Ă&#x; Converting Kilograms to Pounds involves multiplying each data by 2.2
Dividing the data by a constant compresses the spread of the distribution: When we multiply or divide the data points by a constant, ALL summary statistics – measures of center and spread – are multiplied or divided by the same constant. Implications for Standardization When we standardize we are basically subtracting our points by a constant (the mean) and rescaling them with the standard deviation as the constant of compression/amplification. đ?’šâˆ’đ?’š đ?’›= đ?’” The distribution shifts the center and mean to zero The distribution is rescaled to a spread with a standard deviation of 1.
So a Z-score of 1 means that it’s 1 one standard deviation from the mean.
5) A town’s January high temperatures average 36oF with a standard deviation of 10o, while in July the mean high temperature is 74o and the standard deviation is 8o. In which month is it more unusual to have a day with a high temperature of 55o? Explain.
6) Each year thousands of high school students take either the SAT or the ACT, standardized tests used in the college admission process. Combined SAT Math and Verbal Scores go as high as 1600, while the maximum ACT composite score is 36. Since the two exams use very different scales, comparisons of performance are difficult. A convenient rule of thumb is SAT = 40 * ACT + 150; that is, multiple an ACT score by 40 and add 150 points to estimate the equivalent SAT score. An admission officer reported the following statistics about the Act scores of 2355 students who applied to her college one year. Find the summaries of the equivalent SAT scores: Summary Statistics Lowest Score 19 Mean 27 Standard Deviation 3 Third Quartile 30 Median 28 Interquartile Range 6
Scatterplots Scatterplots are the best way to observe relationship between two quantitative variables and visualize an association between them. Direction of Scatterplots: Negative Association ß Y decreases as X increases As the weather gets colder, airconditioning costs decrease.
Positive Association
ß Y increases as X increases The more SI sessions you attend, the higher you GPATM
No Association
ß Random, no correlation. Amount of eggs at Ralphs vs. Car Crashes on the I-5
Also look for Form – some scatterplots display parabolic relationships instead of linear ones, and observe any outliers. Variable of interest is the response variable (y-variable), explanatory variable is the x-variable.
Correlation Correlation measures how strong two quantitative variables are associated. The strength is unaffected by changes in units. Finding the Correlation Coefficient: đ?’ đ?’™ đ?’ đ?’š đ?’“= đ?’?−đ?&#x;? Where the resulting correlation will always lie in between -1 and 1. Correlations at the ends of these margins will have a perfect association. We see in this equation that in order to find the correlation, we have to standardize both variables. The Zscores tell us the individual direction and strength of each data point, averaged out by n-1. 3 Conditions/Assumptions of Correlation 1) Quantitative Variables Condition: Correlation only measures associations between quantitative variables. 2) Straight Enough Condition: Use your best judgment to see if relationship is reasonably straight. Passes Condition Doesn’t Pass Condition
3) No Outliers Condition: This is because outliers have huge effects on the correlation and misrepresent the strength of the association. It can even change the sign of the association.
ß Outlier makes the coefficient nearly one, without it the coefficient is close to zero.
ß Outlier makes the coefficient nearly zero, without it the coefficient is nearly one.
Correlation Properties • Sign of a correlation coefficient gives direction of the association • Correlation is always between -1 and + 1. Correlation can be exactly equal to -1.0 or + 1.0, but these values are unusual in real data. • Correlation of x and y is the same as correlation of y with x. • Correlation has no units. • Correlation is not affected by changes in center or scale of either variable. Correlation only depends on z-scores which are unaffected by changes in center or scale. • Correlation measures strength of the linear association between two variables. Variables can be strongly associated but still have small correlation if the association isn’t linear. • Correlation is sensitive to outliers. Scatterplots and correlation coefficients never prove causation.
7) Assuming the conditions of correlation are met, explain the validity of each statement. a) Multiplying every value of x by 2 will double the correlation. b) Standardizing the variables will make the correlation 0. 8) Explain the mistakes in the following statements: a) “My very low correlation of -0.772 shows that there is almost no association between GDP and Infant Mortality Rate. b) “There was a correlation of 0.44 between GDP and Continent”
9) The correlation between Age and Income as measured on 100 people is r=0.75. Explain, looking at just this information, whether or not each of these possible conclusions is justified: a) When Age increases, Income increases as well. b) The form of the relationship between Age and Income is straight. c) There are no outliers in the scatterplot of Income vs. Age d) Whether we measure Age in years or months, the correlation will still be 0.75
The Law of Large Numbers As we repeat a random process over and over, the proportion of times an event occurs does settle down to one number, so long as random phenomena doesn’t change and events are independent (outcome of one trial doesn’t affect the outcome of another). LLN guarantees relative frequencies will settle down in the long run into a probability – empirical probability. # đ?‘œđ?‘“ đ?‘Ąđ?‘–đ?‘šđ?‘’đ?‘ đ??´ đ?‘œđ?‘?đ?‘?đ?‘˘đ?‘&#x;đ?‘ đ?‘ƒ đ??´ = đ?‘Ąđ?‘œđ?‘Ąđ?‘Žđ?‘™ # đ?‘Ąđ?‘&#x;đ?‘–đ?‘Žđ?‘™đ?‘ The Nonexistent Law of Averages The Law of Large Numbers doesn’t refer to short-run behaviors, only the long-run. The long-run is REALLY REALLY long. While the relative frequencies do converge to those probabilities at this long period of time, it won’t be able to be used as a predictor for outcomes in the short-run. Ex: “We flip a coin for six times and it’s been heads every single time. We predict the next flip is tails since the distribution must be 50/50.â€? Ă&#x; This is wrong. Probability Assignment Rule: The set of all possible outcomes must have a probability of 1. Set of outcomes that are not in event A are called the complement of A, and is denoted Ac. Complement Rule: The probability of an even not occurring is 1 minus the probability it does occur. Ă&#x; Set of probabilities in A and probabilities outside of A (it’s complement Ac) make up the entire sample space.
At least and Some A lot of probabilities involve the phrase “at least� and “some�. This is usually a hint that a complement should be used when solving the problem (not always though). Event A that happens at least once is considered as the event actually happening. Therefore, the complement of not happening at all is NOT all the events happen but
that at least one of the events happens. The situation where every outcome that happens is event A is a subtext of the probability that event A happens at least once, not the other way around. With One Trial đ?‘ƒ đ??´ + đ?‘ƒ đ??´! = 1 With Multiple Trials đ?‘ƒ đ??´ â„Žđ?‘Žđ?‘?đ?‘?đ?‘’đ?‘›đ?‘ đ?‘Žđ?‘Ą đ?‘™đ?‘’đ?‘Žđ?‘ đ?‘Ą đ?‘œđ?‘›đ?‘?đ?‘’ + đ?‘ƒ đ??´ đ?‘‘đ?‘œđ?‘’đ?‘ đ?‘›! đ?‘Ą â„Žđ?‘Žđ?‘?đ?‘?đ?‘’đ?‘› đ?‘Žđ?‘Ą đ?‘Žđ?‘™đ?‘™ = 1 đ?‘ƒ đ??´ â„Žđ?‘Žđ?‘?đ?‘?đ?‘’đ?‘›đ?‘ đ?‘Žđ?‘Ą đ?‘™đ?‘’đ?‘Žđ?‘ đ?‘Ą đ?‘œđ?‘›đ?‘?đ?‘’ = 1 − đ?‘ƒ(đ??´ đ?‘‘đ?‘œđ?‘’đ?‘ đ?‘›! đ?‘Ą â„Žđ?‘Žđ?‘?đ?‘?đ?‘’đ?‘› đ?‘Žđ?‘Ą đ?‘Žđ?‘™đ?‘™) OR vs. AND Addition Rule: For two disjoint events A and B, the probabilities that one or the other occurs is the sum of the probabilities of the two events: đ?‘ˇ đ?‘¨ đ?’?đ?’“ đ?‘Š = đ?‘ˇ đ?‘¨ + đ?‘ˇ(đ?‘Š) provided A and B are disjoint Ă&#x; A and B are disjoint. These events could never occur together in unison. A lot of disjoint events happen in the context of one trial. Ex: “When a baby is born, it can be born a male or a female but not both.â€? The sum of two disjointed evens must be less than or equal to one. If the sum is greater then either: a) The probabilities are wrong b) The events are actually not disjointed/mutually exclusive (it’s possible for them to occur together) Multiplication Rule: For two independent events, A and B, the probability that both A and B occurs is the product of the probabilities of the two events. đ?‘ˇ đ?‘¨ đ?’‚đ?’?đ?’… đ?‘Š = đ?‘ˇ đ?‘¨ Ă— đ?‘ˇ đ?‘Š , Provided A and B are independent Ă&#x; The probability A and B occur is the intersection between their two probability spaces.
10) A consumer organization estimates that over a 1-year period, 17% of cars will need to repaired only once, 7% will need repairs exactly twice, and 4% will require three or more repairs. What is the probability that a car chosen at random will need: a) No Repairs? b) No more than one repair? c) Some Repairs
11) The Mars company says that before the introduction of purple, yellow candies made up 20% of their plain M&M’s, red another 20%, and orange, blue, and green each made up 10%. The rest were brown. a) If you pick and M&M at random, what is the probability that 1. It is brown? 2. It is yellow or orange? 3. It is not green? 4. It is striped? b) If you pick three M&M”s in a row, what is the probability that: 1. They are all brown? 2. The third one is the first one that’s red? 3. None are yellow? 4. At least one is green
12) You roll a fair die three times. What is the probability that: a) You roll all 6’s? b) You roll all odd numbers? c) None of your rolls gets a number divisible by 3? d) You roll at least one 5? e) The numbers you roll are not all 5’s?