Chap3a-Applied Stats in Bus & Eco- Doane/Seward-2E

Page 1

3

Chapter

Describing Data Visually (Part 1)

Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts

McGraw-Hill/Irwin

Copyright Š 2009 by The McGraw-Hill Companies, Inc. All rights reserved.


Visual Description • Methods of organizing, exploring and summarizing data include: - Visual (charts and graphs) provides insight into characteristics of a data set without using mathematics. - Numerical (statistics or tables) provides insight into characteristics of a data set using mathematics.

3A-2


Visual Description • Begin with univariate data (a set of n observations on one variable) and consider the following: Characteristic

Interpretation

Measurement

What are the units of measurement? Are the data integer or continuous? Any missing observations? Any concerns with accuracy or sampling methods?

Central Tendency

Where are the data values concentrated? What seem to be typical or middle data values?

3A-3


Visual Description Characteristic

Interpretation

Dispersion

How much variation is there in the data? How spread out are the data values? Are there unusual values?

Shape

Are the data values distributed symmetrically? Skewed? Sharply peaked? Flat? Bimodal?

3A-4


Visual Description • Example: Price/Earnings Ratios • P/E ratios are current stock price divided by earnings per share in the last 12 months. For example:

3A-5


Visual Description • Measurement • Look at the data and visualize how it was collected and measured.

• Sorting • Sort the data and then summarize in a graphical display. Here are the sorted P/E ratios:

• A histogram graphically displays sorted data. 3A-6


Visual Description • Sorting • Sorting allows you to observe central tendency, dispersion and shape as well as minimum, maximum and range. • When the number of observations is large, a sorted list of data values is difficult to analyze. • To see broader patterns in the data, analysts often prefer a visual display of the data.

3A-7


Dot Plots • A dot plot is the simplest graphical display of n individual values of numerical data. - Easy to understand - Not good for large samples (e.g., > 5,000). Steps in Making a Dot Plot 1. Make a scale that covers the data range 2. Mark the axes and label them 3. Plot each data value as a dot above the scale at its approximate location If more than one data value lies at about the same axis location, the dots are piled up vertically. 3A-8


Creating a Dot Plot in MegaStat

3A-9


Dot Plots • • •

Range of data shows dispersion. Clustering shows central tendency. Dot plots do not tell much of shape of distribution.

• Can add annotations (text boxes) to call attention to specific features. 3A-10


Dot Plots • Small Sample: Home Prices • Consider the following median home prices for nine U.S. Cities.

3A-11

Metropolitan Area

Median Home Price (000)

Akron OH

119.6

Bergen-Passaic NJ

363.0

Bradenton FL

170.4

Colorado Springs CO

181.7

Hartford CT

198.5

Milwaukee WI

186.2

Raleigh-Durham NC

173.8

San Francisco CA

560.2

Topeka KS

100.7


Dot Plots Small Sample: Home Prices • A dot plot is useful to realtors as they discuss patterns in home selling prices within their community.

3A-12


Dot Plots Comparing Groups • A stacked dot plot compares two or more groups using a common X-axis scale.

3A-13


Frequency Distributions and Histograms Bins and Bin Limits • A frequency distribution is a table formed by classifying n data values into k classes (bins). • Bin limits define the values to be included in each bin. Widths must all be the same. • Frequencies are the number of observations within each bin. • Express as relative frequencies (frequency divided by the total) or percentages (relative frequency times 100). 3A-14


Frequency Distributions and Histograms Constructing a Frequency Distribution 1. Find smallest and largest data values

2. Choose the number of bins (k) - k should be much smaller than n. - Too many bins results in sparsely populated bins, too few and dissimilar data values are lumped together. 3A-15


Frequency Distributions and Histograms Constructing a Frequency Distribution - Herbert Sturges proposes the following rule: Sample Size (n)

3A-16

16

Suggested Number of Bins (k) 5

32

6

64

7

128

8

Sample Size Suggested (n) Number of Bins (k) 256 9 512

10

1024

11


Frequency Distributions and Histograms Constructing a Frequency Distribution 3. Set the bin limits: Bin width

X max − X min k

For example, for k = 7 bins, the approximate bin width is: Bin width

68 − 8 60 = = 8.57 7 7

To obtain “nice” limits, we round the width to 10 and start the first bin at 0 to get bin limits: 0, 10, 20, 30, 40, 50, 60, 70 3A-17


Frequency Distributions and Histograms Constructing a Frequency Distribution 4. Put the data values in the appropriate bin In general, the lower limit is included in the bin while the upper limit is excluded. 5. Create the table, you can include Frequencies – counts for each bin Relative frequencies – absolute frequency divided by total number of data values. Cumulative frequencies – accumulated relative frequency values as bin limits increase. 3A-18


Frequency Distributions and Histograms

3A-19


Frequency Distributions and Histograms Histograms • A histogram is a graphical representation of a frequency distribution. Y-axis shows frequency within each bin. • A histogram is a bar chart. X-axis ticks shows end points of each bin.

3A-20


Frequency Distributions and Histograms Histograms • Consider 3 histograms for the P/E ratio data with different bin widths. What do they tell you?

3A-21


Frequency Distributions and Histograms Excel’s Histogram

3A-22


Frequency Distributions and Histograms Mega Stat's Frequency Distribution and Histograms

3A-23


Frequency Distributions and Histograms MINITAB Histogram

3A-24


Frequency Distributions and Histograms Modal Class • A histogram bar that is higher than those on either side. • Unimodal – a single modal class. • Bimodal – two modal classes. • Multimodal – more than two modal classes. • Modal classes may be artifacts of the way bin limits are chosen. 3A-25


Frequency Distributions and Histograms Shape • A histogram suggests the shape of the population. • It is influenced by number of bins and bin limits. • Skew ness – indicated by the direction of the longer tail of the histogram. Left-skewed – (negatively skewed) a longer left tail. Right-skewed – (positively skewed) a longer right tail. Symmetric – both tail areas approximately the same. 3A-26


Frequency Distributions and Histograms

3A-27


Frequency Distributions and Histograms Tips for Effective Frequency Distributions • • •

Check Sturges’ Rule first. Choose a nice, round bin width. Choose bin limits that are multiples of the bin width. • Make sure that the range is covered.

3A-28


Frequency Polygon and Ogive

3A-29


Line Charts Simple Line Charts • Used to display a time series or spot trends, or to compare time periods. • Can display several variables at once. 3A-30


Line Charts Simple Line Charts • Two-scale line chart – used to compare variables that differ in magnitude or are measured in different units.

3A-31


Line Charts Grid Lines • A line graph usually has no vertical grid lines. Horizontal lines can be added to make it easier to establish the y value. Which is easier to read?

3A-32


Line Charts Log Scales • Arithmetic scale – distances on the Y-axis are proportional to the magnitude of the variable being displayed. • Logarithmic scale – (ratio scale) equal distances represent equal ratios. • Use a log scale for the vertical axis when data vary over a wide range, say, by more than an order of magnitude. • This will reveal more detail for small data values. 3A-33


Line Charts Log Scales • Log scale is only suited for positive data values. • Reveals whether the quantity is growing at an increasing percent (concave upward), constant percent (straight line), or declining percent (concave downward)

3A-34


Line Charts Example: U.S. Trade

• What does the log scale graph tell you about growth rate for both series? 3A-35


Line Charts When to Use Log Scales • Useful for - time series data that might be expected to grow at a compound annual percentage rate (e.g., GDP, national debt, future income) - financial charts that cover long periods of time-data that grow rapidly (e.g., revenues)

3A-36


Line Charts Tips for Effective Line Charts 1. Line charts are used for time series data (never for cross-sectional data). 2. Y-axis shows numerical variable while X-axis shows time units with time increasing left to right. 3. Use a zero origin on the Y-axis unless more detail is needed.

3A-37


Line Charts Tips for Effective Line Charts 4. Omit numerical labels on a line chart to avoid clutter. Use gridlines if needed. 5. Use data markers (squares, triangles, circles) if they don’t clutter the graph. 6. Don’t make lines too thick.

3A-38


Bar Charts Plain Bar Charts • Most common way to display attribute data. - Bars represent categories or attributes. - Lengths of bars represent frequencies.

Vertical Bar Chart 3A-39

Horizontal Bar Chart


Bar Charts 3-D and Novelty Bar Charts

3-D Bar Chart

3A-40

Pyramid Chart


Bar Charts Pareto Charts • Special type of bar chart used in quality management to display the frequency of defects or errors of different types. • Categories are displayed in descending order of frequency. • Focus on significant few (i.e., few categories that account for most defects or errors). 3A-41


Bar Charts Stacked Bar Chart • Bar height is the sum of several subtotals. Areas may be compared by color to show patterns in the subgroups and total.

3A-42


Bar Charts Bar Charts for Time Series Data • Bar charts can be used for time series data although it may be harder to compare trends.

3A-43


Bar Charts Tips for Effective Bar Charts 1. Show the numerical variable of interest with vertical bars on the Y-axis, category labels on the X-axis. 2. For time series quantities, display the category labels on the horizontal X-axis with time increasing from left to right. 3. The height or length of each bar should be proportional to the quantity displayed. 4. Put numerical values at the top of each bar, except if too cluttered. 3A-44


Applied Statistics in Business and Economics

End of Chapter 3A

3A-45


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.