FromProf-Chap2-AppliedStats/Bus&Eco- Doane-2E

Page 1

Data Collection Data Vocabulary Level of Measurement Time Series and Cross-sectional Data Sampling Concepts Sampling Methods Data Sources Data Errors

Chapter

2


Data Vocabulary Subjects, Variables, Data Sets • We will refer to Data as plural and data set as a particular collection of data as a whole. • Observation – each data value. • Subject (or individual) – an item for study (e.g., an employee in your company). • Variable – a characteristic about the subject or individual (e.g., employee’s income).


Data Vocabulary ” Subjects, Variables, Data Sets • Three types of data sets: Data Set

Variables

Typical Tasks

Univariate

One

Histograms, descriptive statistics, frequency tallies

Bivariate

Two

Scatter plots, correlations, simple regression

Multivariate More than two

Multiple regression, data mining, econometric modeling


Data Vocabulary ” Subjects, Variables, Data Sets Consider the multivariate data set with 5 variables 8 subjects 5 x 8 = 40 observations


Data Vocabulary ” Data Types • A data set may have a mixture of data types. Types of Data Attribute (qualitative) Verbal Label Coded X = economics X=3 (your major) (i.e., economics)

Numerical (quantitative) Discrete X=2 (your siblings)

Continuous X = 3.15 (your GPA)


Data Vocabulary Attribute Data • Also called categorical, nominal or qualitative data. • Values are described by words rather than numbers. • For example, - Automobile style (e.g., X = full, midsize, compact, subcompact). - Mutual fund (e.g., X = load, no-load).


Data Vocabulary Data Coding • Coding refers to using numbers to represent categories to facilitate statistical analysis. • Coding an attribute as a number does not make the data numerical. • For example, 1 = Bachelor’s, 2 = Master’s, 3 = Doctorate • Rankings may exist, for example, 1 = Liberal, 2 = Moderate, 3 = Conservative


Data Vocabulary Numerical Data • Numerical or quantitative data arise from counting or some kind of mathematical operation. • For example, - Number of auto insurance claims filed in March (e.g., X = 114 claims). - Ratio of profit to sales for last quarter (e.g., X = 0.0447). • Can be broken down into two types – discrete or continuous data.


Data Vocabulary Discrete Data • A numerical variable with a countable number of values that can be represented by an integer (no fractional values). • For example, - Number of Medicaid patients (e.g., X = 2). - Number of takeoffs at O’Hare (e.g., X = 37).


Data Vocabulary Continuous Data • A numerical variable that can have any value within an interval (e.g., length, weight, time, sales, price/earnings ratios). • Any continuous interval contains infinitely many possible values (e.g., 426 < X < 428).


Level of Measurement ” Four levels of measurement for data: Level of Measurement

Characteristics

Example

Nominal

Categories only

Eye color (blue, brown, green, hazel)

Ordinal

Rank has meaning

Bond ratings (Aaa, Aab, C, D, F, etc.)

Interval

Distance has meaning

Temperature (57o Celsius)

Ratio

Meaningful zero exists

Accounts payable ($21.7 million)


Level of Measurement Nominal Measurement • Nominal data merely identify a category. • Nominal data are qualitative, attribute, categorical or classification data (e.g., Apple, Compaq, Dell, HP). • Nominal data are usually coded numerically, codes are arbitrary (e.g., 1 = Apple, 2 = Compaq, 3 = Dell, 4 = HP). • Only mathematical operations are counting (e.g., frequencies) and simple statistics.


Level of Measurement Ordinal Measurement • Ordinal data codes can be ranked (e.g., 1 = Frequently, 2 = Sometimes, 3 = Rarely, 4 = Never). • Distance between codes is not meaningful (e.g., distance between 1 and 2, or between 2 and 3, or between 3 and 4 lacks meaning). • Many useful statistical tests exist for ordinal data. Especially useful in social science, marketing and human resource research.


Level of Measurement Interval Measurement • Data can not only be ranked, but also have meaningful intervals between scale points (e.g., difference between 60°F and 70°F is same as difference between 20°F and 30°F). • Since intervals between numbers represent distances, mathematical operations can be performed (e.g., average). • Zero point of interval scales is arbitrary, so ratios are not meaningful (e.g., 60°F is not twice as warm as 30°F).


Level of Measurement Likert Scales • A special case of interval data frequently used in survey research. • The coarseness of a Likert scale refers to the number of scale points (typically 5 or 7). “College-bound high school students should be required to study a foreign language.” (check one)

Strongly Agree

Somewhat Agree

Neither Agree Nor Disagree

Somewhat Disagree

Strongly Disagree


Level of Measurement Ambiguity • Grades are usually coded numerically (A = 4, B = 3, C = 2, D = 1, F = 0) and are used to calculate a mean GPA. • Is the interval from 3.0 to 4.0 really the same as the interval from 1.0 to 2.0? • What is the underlying reality ranging from 0 to 4 that we are measuring? • Best to be conservative and limit statistical tests to those for ordinal data.


Level of Measurement Ratio Measurement • Ratio data have all properties of nominal, ordinal and interval data types and also possess a meaningful zero (absence of quantity being measured). • Because of this zero point, ratios of data values are meaningful (e.g., $20 million profit is twice as much as $10 million). • Zero does not have to be observable in the data, it is an absolute reference point.


Sampling Concepts Sample or Census? • A sample involves looking only at some items selected from the population. • A census is an examination of all items in a defined population. • A sampling frame is a list or quasi-list of all elements in the target population.


Sampling Concepts Situations Where A Sample May Be Preferred: Infinite Population No census is possible if the population is infinite or of indefinite size (an assembly line can keep producing bolts, a doctor can keep seeing more patients). Destructive Testing The act of sampling may destroy or devalue the item (measuring battery life, testing auto crashworthiness, or testing aircraft turbofan engine life). Timely Results Sampling may yield more timely results than a census (checking wheat samples for moisture and protein content, checking peanut butter for aflatoxin contamination).


Sampling Concepts Situations Where A Sample May Be Preferred: Accuracy Sample estimates can be more accurate than a census. Instead of spreading limited resources thinly to attempt a census, our budget of time and money might be better spent to hire experienced staff, improve training of field interviewers, and improve data safeguards. Cost Even if it is feasible to take a census, the cost, either in time or money, may exceed our budget. Sensitive Information Some kinds of information are better captured by a well-designed sample, rather than attempting a census. Confidentiality may also be improved in a carefully-done sample.


Sampling Concepts Situations Where A Census May Be Preferred Small Population If the population is small, there is little reason to sample, for the effort of data collection may be only a small part of the total cost. Large Sample Size If the required sample size approaches the population size, we might as well go ahead and take a census. Database Exists If the data are on disk we can examine 100% of the cases. But auditing or validating data against physical records may raise the cost. Legal Requirements Banks must count all the cash in bank teller drawers at the end of each business day. The U.S. Congress forbade sampling in the 2000 decennial population census.


Sampling Concepts Parameters and Statistics • Statistics are computed from a sample of n items, chosen from a population of N items. • Statistics can be used as estimates of parameters found in the population. • Symbols are used to represent population parameters and sample statistics.


Sampling Methods Probability Samples Simple Random Sample

Use random numbers to select items from a list (e.g., VISA cardholders).

Systematic Sample

Select every kth item from a list or sequence (e.g., restaurant customers).

Stratified Sample

Select randomly within defined strata (e.g., by age, occupation, gender). Like stratified sampling except strata are geographical areas (e.g., zip codes).

Cluster Sample


Sampling Methods Nonprobability Samples Judgment Sample

Use expert knowledge to choose “typical� items (e.g., which employees to interview).

Convenience Sample

Use a sample that happens to be available (e.g., ask co-worker opinions at lunch).


Sampling Methods Simple Random Sample • Every item in the population of N items has the same chance of being chosen in the sample of n items.

• We rely on random numbers to select a name. =RANDBETWEEN(1,48)


Sampling Methods With or Without Replacement • If we allow duplicates when sampling, then we are sampling with replacement. • Duplicates are unlikely when n is much smaller than N. • If we do not allow duplicates when sampling, then we are sampling without replacement. • Strict SRS requires replacement.


Sampling Methods Systematic Sampling • Sample by choosing every kth item from a list, starting from a randomly chosen entry on the list. • For example, starting at item 2, we sample every k = 4 items to obtain a sample of n = 20 items from a list of N = 78 items.

• Note that N/n = 78/20 ≈ 4.


Sampling Methods Systematic Sampling • For example, out of 501 companies, we want to obtain a sample of 25. What should the periodicity k be? k = N/n = 501/25 ≈ 20. • So, we should choose every 20th company from a random starting point.


Sampling Methods Stratified Sampling • Utilizes prior information about the population. • Applicable when the population can be divided into relatively homogeneous subgroups of known size (strata). • A simple random sample of the desired size is taken within each stratum. • For example, from a population containing 55% males and 45% females, randomly sample 120 males and 80 females (n = 200).


Sampling Methods Cluster Sample • Clusters usually consist of geographical regions. • One-stage cluster sampling – sample consists of all elements in each of k randomly chosen subregions (clusters). • Two-stage cluster sampling, first choose k subregions (clusters), then choose a random sample of elements within each cluster.


Sampling Methods ” Cluster Sample • Here is an example of 4 elements sampled from each of 3 randomly chosen clusters (two-stage cluster sampling).


Sampling Methods ” Cluster Sample • Cluster sampling is useful when - Population frame and stratum characteristics are not readily available - It is too expensive to obtain a simple or stratified sample - The cost of obtaining data increases sharply with distance - Some loss of reliability is acceptable


Sampling Methods Judgment Sample • A nonprobability sampling method that relies on the expertise of the sampler to choose items that are representative of the population. • Can be affected by subconscious bias (i.e., nonrandomness in the choice). • Quota sampling is a special kind of judgment sampling, in which the interviewer chooses a certain number of people in each category.


Sampling Methods Convenience Sample • Take advantage of whatever sample is available at that moment. A quick way to sample.

Sample Size • Sample size depends on the inherent variability of the quantity being measured and on the desired precision of the estimate.


Data Sources ” Useful Data Sources Type of Data

Examples

U.S. general data U.S. economic data

Statistical Abstract of the U.S. Economic Report of the President

Almanacs Periodicals

World Almanac, Time Almanac Economist, Business Week, Fortune

Indexes Databases

New York Times, Wall Street Journal CompuStat, Citibase, U.S. Census

World data Web

CIA World Factbook Google, Yahoo, msn


Data Errors ” Sources of Error Source of Error

Characteristics

Nonresponse bias

Respondents differ from nonrespondents

Coverage error Response error

Incorrect specification of frame or population Respondents give false information

Interviewer error

Responses influenced by interviewer

Measurement error

Survey instrument wording is biased or unclear Random and unavoidable

Sampling error


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.