Modern Microeconomic Analysis by Sam Selikoff

Modern Microeconomic Analysis for Business Strategy

Preface Like top colleges and universities across the nation, UF has become very selective in admissions. The SAT math scores of incoming UF freshmen are on par with those at the universities with the nation’s top 20 undergraduate business programs. This increase in selectivity at top schools has been accompanied by a decrease in selectivity at other schools, where most students attend and where most books are sold. As a result, managerial economics texts have deemphasized mathematical rigor at just the point in time when it is of most value for UF students. At the same time, textbook prices have soared. The Florida legislature has required explicit justification for the use of any expensive new textbook, and UF has encouraged faculty to provide low cost materials. This mirrors a national movement among the best universities to make online course materials available free – two examples are the extensive Open Courseware available from MIT and Preston McAfee’s (Cal Tech) text Introduction to Economic Analysis. Provision of high quality free online course materials is becoming a hallmark of top universities. This draft textbook addresses the need for a text that is appropriate in its level of academic rigor and topic selection for UF students taking Managerial Economics and is also available at low cost to my students. The text is available free on the course website and a printed version is available at Target Copy for approximately $20. The text started as a very detailed set of notes based on my lectures created by a Sam Selikoff, a former teaching assistant. I have gone through them adding, clarifying, editing, and shaping them into chapters, with Sam’s help. Further, they have been proofread and edited twice by another teaching assistant, Michael Canencia. In addition, all errors noted by students during the Fall 2009 semester have been corrected in the current version. However, some chapters have received much more attention and revision than others at this stage! Writing a textbook is a long and difficult task ‐ this is very much a work in progress. The text is provisional and incomplete, and I have no doubt many errors remain ‐ you must not rely solely on it in my class. Anything covered in lecture is fair game for exams, even if the corresponding material in the text exhibits errors or is incomplete. Even with these imperfections, most students said they found it very useful last semester – very much more useful than students have found alternative textbooks. So, even with its flaws, I think it will be quite helpful in your studies, as long as you remember it is an imperfect work in progress and therefore…

USE WITH CAUTION! Jim Dewey 1/1/2010

Contents

Part 1: Analytical Approach and Tools Chapter 1

Introduction

Appendix to Chapter 1

Math Used in Managerial Economics

Chapter 2

Cost, Demand, and Profit Maximization

Chapter 3

Applications and Extensions of Optimal Production and Pricing

Part 2: Empirical Approximations and Econometrics Chapter 4

Estimating and Interpreting Approximations

Chapter 5

Evaluating Regression Analyses

110

Chapter 6

130

Omitted Variables Bias

Part 3: A Closer Look at Some of the Tools Chapter 7

Individual Choice

142

Chapter 8

Applications and Extensions of Consumer Theory

159

Chapter 9

Non‐Linear Pricing

170

Chapter 10

Uncertainty with Risk Aversion

187

Chapter 11

203

More on Production and Cost

iii

Part 4: Game Theory – Modeling Strategic Interaction Chapter 12

One Shot Games with Discrete Strategies

222

Chapter 13

One Shot Games with Continuous Strategies

238

Chapter 14

Repeated Games

249

Part 5: Product Market Structure, Strategy, and Analysis Chapter 15

Homogenous Product Markets

264

Chapter 16

Differentiated Product Markets

282

Chapter 17

Perfect Competition

295

Chapter 18

Applications of Supply and Demand Analysis

308

Chapter 19

Market Structure Wrap Up

315

Part 6: Firm Structure Chapter 20

Input Procurement and Contracting

330

Chapter 21

347

The Firm

Part 1 Analytical Approach and Tools

Chapter 1 Introduction Aim of the Course Microeconomics is the study of how individuals and organizations allocate scarce resources to achieve their ends. That includes the nature of the interactions between those individuals and organizations, especially in markets. All microeconomic analysis texts share much in common, since the core tools of microeconomic analysis are relatively few. However, this book focuses on applying those tools systematically to problems faced by firms and their managers. Modern microeconomics offers an analytical approach that can help firms and their managers efficiently organize their thoughts when faced with business decisions. Further, applied microeconomic analysis underpins much of the information upon which business students will base professional decisions throughout their careers. Some examples include: 1) research referenced in newspapers, magazines, and trade journal articles about the market in which their firm operates, 2) reports conducted for firms by in‐house analysis groups, 3) research conducted for firms by consultants, and 4) studies conducted by government agencies pursuant to regulatory or other proceedings. Some of these analyses will be quite good. Others will be quite bad. Relying on bad analyses, or using good ones incorrectly, leads to wasteful, and potentially disastrous, decisions. Managers need to understand the tools of microeconomic analysis well enough to be able to spot bad information and to use the good information appropriately. The book first builds an analytical toolbox and then applies it in a rigorous manner to individual topics. While the individual topics are of some interest in their own right, the overarching goal is to show commonalities in the analytical process through repeated application. It is impossible to learn to undertake advanced analyses from a single book or a single undergraduate class. But, through repeated application of the basic tools of economic analysis, it is possible to gain enough insight into economists’ basic tools to focus clearly and critically on the important economic aspects of a problem, and, also allow intelligent evaluation of advanced analyses conducted by others. It is intended for use in a course focused on intermediate microeconomic analysis, especially of business problems (managerial economics). As such, it presumes both a solid grounding in microeconomic principles and basic calculus. However, the text is sufficiently self‐contained for a diligent student who has never had a principles of microeconomics course and whose calculus knowledge has become quite rusty to master the material. A widely accepted way of classifying the learning objectives common in traditional education is given by the cognitive domain of Bloom’s taxonomy.1 It 1 http://en.wikipedia.org/wiki/Bloom%27s_Taxonomy

places learning objectives into six (more or less) hierarchical categories: 1) knowledge (remembering), 2) comprehension (understanding), 3) application, 4) analysis, 5) synthesis, and 6) evaluation. While many lower level college courses focus largely on the first three levels, as the words themselves imply, microeconomic analysis focuses on higher level learning objectives. Memorizing terms, understanding concepts, and being able to apply them in situations you have seem before are necessary to learning economic analysis, but, far from sufficient. Repeatedly working the largest possible set of practice problems until you can do them backwards and forwards will help with the first two levels and somewhat with the third, but will not help you master the higher order learning objectives. You must deconstruct each new concept you encounter into its constituent pieces and make sure you understand it from any possible angle and can generalize it and adapt it to completely new situations. You must practice analyzing situations you have never before encountered, synthesizing information from various sources to reach insights that have never been explained to you in the past, and using the results of your analyses to evaluate alternative courses of actions or potential solutions to problems.

The Goal of the Firm (and its Management) Firms procure inputs and use them in turn to produce goods or services that are of value to their customers. The difference between the value of the firm’s output and the cost of the inputs used is the value added by the firm. If the share of the value of its products which it is able to capture in the form of revenue exceeds the cost of production, some residual revenue will be left over as profit. Mathematically,

π = R −C

(1.1)

where π is profit, R is total revenue, and C is total cost. Who gets to claim this residual revenue (profit)? The firm’s shareholders are the residual claimants in this case. The value of all shares in the firm, in fact, will equal (approximately) the expected present value of all future profits, though, we still need to define exactly what we mean by expected present value. Since the shareholders prefer to be wealthier, they will seek to provide their agents, the firm’s management, with incentives to maximize the value of their shares. Thus, the primary goal of the firm’s management is to utilize the limited resources available to them to maximize the expected present value of future profits.

Present Value The difference between the future value and present value of a sum of money depends on the time value of money as reflected in the interest rate (r). If the interest rate is 5%, for example, after one period the initial $100 will have earned $5 in interest, so the initial amount will have grown to $105. Every time interest compounds, the value at the end of the period is 1.05 times the value at the end of the previous period. After two periods, the same $100 would be worth 1.05 times

$105, or $110.25. Generally, if the interest rate is r, the future value (denoted FV) of any given initial or present value (denoted PV) after t time periods is FV = PV (1 + r )t .

(1.2)

This equation compounds the present value forward by a factor, determined by the length of time and the interest rate, to obtain the future value. Since any initial sum growing at a given interest rate will be grow into a larger sum in the future, any given future value is worth less than that future value at the present time. Dividing both sides of equation (1.2) by (1 + r )t , we find the following expression for the present value of a future amount: PV =

FV . (1 + r )t

(1.3)

This equation discounts the future value backward by a factor, determined by the length of time and the interest rate, to obtain the present value. Example: Present/Future Value

If the interest rate is 10% and you invest $1 today, what is the future value three years from now? Solution: Use the equation for future value, observing PV = $1, r = 0.10 and t = 3.

FV = PV (1 + r )t FV = $1(1 + 0.1)3

FV = $1(1.1) = $1.331 3

If the interest rate is 10% and you receive $3 two years from today, what is the present value? Solution: Use the expression for present value, observing FV=$3, r=0.10 and t=2.

FV (1 + r )t 3 PV = (1 + 0.1)2 3 PV = 2 = 2.48 1.1 PV =

Of course, future profits are not realized in only one period. To find the present value of a series of profits realized over any number of years, we simply add up the present value of each individual profit realization. So, if πt represents the value of profit realized after period t, the value of the firm at time 0 (the present) would simply be

V0 = π 0 +

= ∑t

π1 1+ r

π2

(1 + r )

πt

(1 + r )

π3

(1 + r )

(1.4)

Example: Net Present Value of a Project

Suppose a project involves an expenditure of $124 currently, and will return $21 after one year, $72 more after 2 years, and $45 after the third year, at which point the project ends. If the interest rate is 8%, what is the net present value of the project? Solution: Use the equation for net present value, observing r = 0.08, π0 = ‐124, π1 = 21, π2 = 72, and π3 = 45.

PV = −124 +

21 72 45 + + ≈ −7.10 . 2 1.08 1.08 1.083

Information Structure Often calculating a firm’s value, or the net present value of any particular investment or project, is impossible to do with complete precision since there is no way to know for sure what the future will bring. The structure of information ‐ who knows what, when they know it, and how certain they are in that knowledge ‐ is important in analyzing any decision. We consider three possible information structures.

Complete and Perfect Information When everyone knows exactly what the future holds, and, knows that everyone has the same information, it is referred to as complete and perfect information. This is, obviously, the simplest possible information structure. In this case, calculating the present value of profit, or the present value of any cash flow project, is a straightforward application of equation (1.4).

Simple Risk and Uncertainty We will think of situations involving uncertain outcomes as lotteries, in which every possible outcome might occur with some probability. Risk refers to a situation where everyone shares (more or less) common estimates of the probabilities of each possible outcome based on the laws of probability or rigorous empirical analysis. Uncertainty encompasses risk and broader cases of imperfect information where there is not enough information to form assessments of the probabilities of every possible outcome based on agreed upon laws of probability or empirical regularities. In that case, everyone must form their own subjective probability assessments that reflect their best guesses about the probability of each possible

outcome based upon whatever information is available to them. If everyone is exactly identical and all information is shared in common, everyone will reach common probability estimates. When people differ in their experiences, knowledge, and abilities, their subjective probability estimates will differ. Uncertainty affects virtually every decision taken by a firm’s management. For our purposes, sometimes we may assume away the uncertainty to focus on other important aspects of a problem in situations where the uncertainty itself is not important to the main point under consideration. However, a great deal of the information in the text will explicitly incorporate incomplete information in the form of uncertainty about future events.

Asymmetric Information Sometimes, some individuals have better or more accurate information about uncertain contingencies than everyone else. While such asymmetric information may play a role in a number of important markets, it is an advanced topic which we will not return to in detail until much later. At this point, though, it is worth noting that this phenomenon can lead to severe market failure. Typically, such market failures take one of two forms. Adverse selection occurs when one individual has information about inherent characteristics of the problem that is not available to others. To see why this matters, consider the market for used cars. For purposes of our example, suppose used cars are either high quality or low quality. Further, suppose individuals who own used cars and are seeking to sell them know the quality of the car, but potential buyers do not. Those looking to sell high quality cars will only sell them at a high price. However, if there are enough low quality cars out there, buyers will not be willing to pay a high price for any used car, for fear they will get a low quality car that was not worth the money. Anticipating this, owners of the highest quality used cars will simply not offer them for sale, and will instead hold onto them longer than they otherwise would. The market for the highest quality used cars would simply not exist if information was too asymmetric. Moral Hazard occurs when one individual has information about an action they have taken that is not available to others and is therefore shielded from the consequences of their actions. For example, an hourly employee whose work rate is hard to monitor may not work as hard as someone whose work rate is easier to monitor. This can create a role for incentive contracts. As another example, all else equal, someone with complete homeowners insurance coverage may take less care in preventing losses due to, say, fire damage. This creates a need for deductibles and for explicit incentives in the form of policy discounts for homeowners to undertake safety investments.

Expected Value and Attitudes Toward Risk When an entity such as an individual or a firm faces uncertainty (whether or not information is symmetric), their attitude toward risk, as well as the degree of the risk itself, affects their evaluation of the options they face. Therefore, we will

consider the implications of different information structures and attitudes toward risk. Before doing so, however, it is useful to first introduce two concepts, expected value (EV) and the certainty equivalent (CE). Roughly speaking, expected value is the average outcome if a given lottery is played a very large number of times. More specifically, if i is an index of the possible outcomes (so that i=1 for the first possible outcome, and so on), xi is the value if outcome i occurs, and fi is the probability of outcome i, the definition of expected value of x, denoted E(x), is

E ( x) = ∑ i f i xi .

(1.5)

We will often use f to denote probabilities. But, sometimes we will write Pr(xi) to denote the probability that x takes on the specific value xi. Example: Expected Value

Suppose the probability that profit is $40 is 0.8; otherwise, profit is ‐$100. Find the expected value of this gamble. Solution: Use the definition of expected value, observing that Pr(π=40) = 0.8 and Pr(π=‐100) = 0.2. n

E ( x) = ∑ Pr( xi ) xi i =1

E ( x) = 0.8(40) + 0.2(−100) = 12 Notice that we found the expected value to be $12. When the uncertainty is resolved, profit will either be $40 or it will be ‐$100. What, then, is the interpretation of the $12? If this gamble were taken 100 times, about 80 times profit would be $40, and the other 20 times profit would be ‐$100; the average per‐period profit would then be $12. Now imagine facing a choice between a lottery on one hand and a certain sum of money on the other. If the sure thing is a low enough value, the lottery will be preferred, and if the sure thing is a high enough outcome, it will be preferred. The certainty equivalent (CE) of a lottery is the sum of money for certain that is viewed as exactly equivalent to the lottery. A risk neutral entity is indifferent toward risk – they care only about the expected value. For them, the certainty equivalent and the expected value are equal (CE=EV). Faced with a choice between the lottery in the example above and $12 for certain, a risk neutral individual would be indifferent. Most individuals, however, are risk averse, meaning the certainty equivalent of a lottery is less than its expected value from their point of view (CE<EV). They would choose $12 for sure rather than the lottery above. For someone who is risk loving, the certainty

equivalent of a lottery is higher than the expected value (CE>EV). They would choose the lottery above over $12 for certain.

Expected Present Value and the Value of the Firm While most individuals are risk averse, we will assume firms evaluate uncertain options in an approximately risk neutral manner. Why? First, it is the simplest possible model, allowing us to focus on other, more important issues until we learn to model risk aversion (in Chapter 5). Second, individual stockholders can diversify by buying shares in many different firms and holding any number of other types of assets, such as bonds, currency, or real estate. To the extent the risks associated with different assets in the portfolio are independent, diversification reduces the aggregate risk in the portfolio. In the most extreme case, if shareholders could diversify away all the risk in their portfolio (so below expected returns on some investments are exactly offset by above normal returns on others), they would simply want the firms in which they held shares to make the choices that maximize expected profits, regardless of the apparent risk to the individual firm. In reality it is not possible to diversify away all risk in a portfolio. But, the uncertainty in the return of an individual stockholder’s diversified portfolio is far less than the uncertainty of the expected profit of an individual firm in that portfolio. Further, the relationship between the expected return of a well diversified portfolio and the expected return to any given stock in that portfolio is much stronger than the relationship between the overall uncertainty about the return of a the portfolio and the uncertainty about the return of an individual firm’s stock in that portfolio. So, it makes sense to model firms as if managers care predominantly about expected profit, with the degree of uncertainty only a secondary concern. Above, we argued the value of a firm was approximately equal to the present value of future profits when there was no uncertainty. If the value of a firm is evaluated in an approximately risk neutral manner, the value of the firm will be the expected present value of future profits (EPV). The idea is to find the expected value of profit in every future period and then discount those values back to the present. Mathematically, EPV is

EPV = ∑ t

Ei ( xit )

(1 + r )

=∑ t

∑

fit xit

(1 + r )

(1.6)

The value of the firm at time 0 is then V0 = ∑ t

∑

fitπ it

(1 + r )

(1.7)

Example: Present Expected Value

Suppose a firm is considering acquiring the rights to a project that has an uncertain return over the following two years. After one year, the project will

return $100 with probability 0.4, otherwise it will return −$200 at that time. After two years, there is a 0.7 probability the project will return $400 in addition to the first year returns, otherwise it will return an additional ‐$200. What is the most the firm should be willing to pay to acquire this project? Assume an interest rate of 7%. Solution: Using the definition of expected present value, calculate the expected value of the return each period, discount them back to the present, and sum.

0.4(100) + 0.6(−200) 0.7(400) + 0.3(−200) + (1.07) (1.07) 2 PEV = −74.76636 + 192.15652 = 117.39017 PEV =

This expected present value is what the project is worth to us today. The value of a firm is nothing more than the expected present value of all future cash flows the firm may generate from all projects it may undertake.

Value of Information – Part 1 – Yes/No Decisions In a world of uncertainty, supplemental information helps managers make better profit‐maximizing decisions. Every manager has their own “best guess” about the future, based on the internal knowledge of the firm. However, if a manager is able to accumulate additional outside information, perhaps through hiring a consultant, they may be able to readjust their perceptions of the future. It follows from this that information can be valuable to managers. In fact, it is valuable to the extent it leads the manager to update their probability estimates and to therefore alter their decisions. We start with the simplest kind of decision, where a firm faces a single, dichotomous decision. Imagine, for example, we are considering drilling an oil well, but are uncertain as to whether or not we will strike oil. It costs money to build the rig, so if we hit oil we will make a profit, but if we don’t hit oil we will have lost the cost of building the rig. We have our own ideas about how likely it is that we will strike oil, based on our previous knowledge, experience, and other freely available information. But, we also have the option to hire a geological consultant who could run tests and analyze samples and tell us what their “informed” opinion is. Consultants aren’t perfectly accurate; they merely have information that allows us to update our own probability estimates. So the question becomes, what is it worth for us to have better information, rather than perfect information? First, consider whether or not we should proceed based only on our own information. Suppose this project has two possible outcomes, call them success (S) or failure (F), and that the probability of success is Pr(S) and the probability of failure is Pr(F) = 1− Pr(S) . The payoff for success is π S and the payoff for failure is π F , where π S > 0 and π F < 0 . Finally, assume we do not need to worry about discounting, for simplicity.

The figure below shows the situation in a simple decision tree. The firm’s decision to proceed or not is represented graphically by the white square at the left. If they choose not to proceed (the top path through the figure), the probability is 1 (the dashed box) that they will earn a payoff of $0. If, however, they proceed, they are not sure what will happen. In that case, whether or not they succeed depends on the resolution of uncertainty regarding the underlying state of nature, represented by the shaded circle. If conditions turn out to be favorable, which occurs with probability Pr(S), the firm makes a profit of πS. If conditions turn out to be unfavorable, which occurs with probability Pr(F), the firm makes a profit of πF. Don’t Proceed

Proceed

Pr(S)

πS

Pr(F)

πF

How should the we decide whether or not to proceed? If both payoffs are positive, there is no ambiguity, undertake the project. Similarly, if both payoffs are negative, reject the project. In either case, additional information can not change the decision, and so has no value. Otherwise, to determine if we should undertake the project or reject it, calculate expected profit: E (π) = ∑ i Pr(πi )πi E (π) = Pr( S )πS + Pr( F )π F

If this value is positive, undertake the project; otherwise, reject it. Given only this information about the project, we can say that our expected profit with no additional information beyond our guesses about these probabilities (NoInfo) is

E(π | NoInfo) = Max ((Pr(S)π S + Pr(F)π F ),0).

(1.8)

The reason the above expression written as the maximum of two arguments is that if the expected profit were negative, you would not proceed, and would actually earn 0, not a negative amount. So, 0 is the “worst” possible expected profit in this case. Example: Expected Profit with Initial Information

Suppose a project has the following values: Pr( S ) = 0.5 , π S = 100, and π F = −60 . What is the expected profit without additional information? Solution: Use the above definition of the expected profit given no additional info.

E(π | NoInfo) = Max ((Pr(S)π S + Pr(F)π F ),0) E(π | NoInfo) = Max (.5(100) + .5(−60),0)

E(π | NoInfo) = Max(20,0) = 20 Since the expected profit is positive, we will undertake the project with an expected profit of $20. The scenario is illustrated in the decision tree below. Don’t Proceed

Proceed

100

0.5 0.5

0.5(100)‐0.5(60)=20>0

‐60

While the expected payoff in the previous example was positive, which makes the project seem like a good idea, there is a significant chance the project will be unsuccessful, leading to a loss. More precise information about the chances of success in this particular case might allow us avoid that loss. In another situation, where expected profits were lower, we might decide not to proceed, even though there is a chance of profit. In that case, additional information might have convinced us the potential return was worth the risk. Suppose we can buy additional information that will tell us more about the project’s probability of success. Before we buy the report, we don’t know what it will say; we just know that it will provide us with some new signal about how likely our project is to succeed. For simplicity, assume that once we buy this information, it will either tell us “good news” (GN) or “bad news” (BN). There is some probability of good news, Pr(GN). The probability of bad news, Pr(BN), is simply 1Pr(GN). If we get good news, we will formulate a new opinion about the probability of success, conditional on that news, Pr(S|GN). Similarly, if we get bad news, we will have a new estimate of the chance of success, conditional on that report, Pr(S|BN). The process of updating our initial or prior assessment of the likelihood of success is an example of Bayesian updating, so called because of its association with Bayes’ theorem. Assuming good news is relatively more likely if the state of nature is such that the project will be successful, we revise our assessment of the probability of success up with good news and down with bad news. But, it should be noted that “good news” and “bad news” are just labels. Suppose a consultant was known for almost always being wrong – when he says a project is most likely to succeed, it almost always fails, and vice versa. That consultant’s report is very valuable – as long as we know to do the opposite of what he indicates. In that case, it would be “good news” if the consultant said the project were most likely to fail. How might we actually calculate these initial and updated probabilities? In the simplest and most concrete case, there is a firm empirical basis from which to infer

probabilities. This is easiest to illustrate with an example. Suppose we purchased similar information several times in the past. Each time, we either proceeded with our project whether the report signaled good news or bad, or the times we did not proceed, we later learned whether the project would have succeeded or not if we had gone forward. Further, suppose we kept detailed records and we believe past outcomes and observations are a reasonable basis from which to accurately infer future probabilities. The results are shown in the table below.

Report

Outcome

Success

Failure

Total

Good News

Bad News

Total

From this historical data, we can estimate the probability of success given good news or bad news. The probability of success given good news is how many times we’ve succeeded after receiving a report of good news, divided by how many times we’ve had good news, or

Pr(S | GN) = 6 = 3 . 8 4

Given that we received a good report, we now think the probability of success is 0.75. Those who have had and remember statistics will recognize this as a straightforward calculation of a conditional probability. Similarly, the probability of success given bad news is how many times we’ve succeeded after receiving bad news, divided by how many times we’ve been given bad news, or

Pr(S | BN) = 4 12 = 1 3 .

Before moving on, note that we can also use this information to calculate the probability of success with no additional information and the probability that we will receive good news if we buy the report. Since success occurs ten times out of twenty in total, the probability of success if no additional information is observed is

Pr( S | NoInfo) = 10

= 1 . 2

Similarly, of the 20 times the report was purchased, it yielded good news eight times. So, the probability we will receive good news if we buy the report is

Pr(GN ) = 8

= 2 . 5

In practice, it is unlikely that a manager’s probability estimates will be based entirely on this sort of precise empirical analysis. At the other extreme, a manager may have only their best subjective guess based on their previous experiences in cases that are roughly similar and the impressions they get from talking with whomever they are considering purchasing the additional information from. Often, the situation will fall between those two extremes. There may be some data based on previous situations. However, those situations may not be a perfect match for the current one. And, while experts who can provide additional information will have established track records, those track records may not be perfect indicators of future performance for many possible reasons. So, the manager is left to fill in the gaps in what they can empirically estimate with their own subjective evaluation of the situation and the quality of the information they are considering purchasing. “Good” managers are good at taking whatever information is available and formulating guesses about the actual but unknown state of nature and acting accordingly. With this understanding of how we might arrive at probability estimates, we return to our consideration of the value of additional, but still imperfect, information. The decision tree below summarizes the situation, as so far described. The open square at the far left denotes the initial decision we face, to buy the information or not. If we decide not to buy the additional information, the situation is just like before – we face a decision to drill or not based only on our initial probability estimates. If however, we decide to buy the information, uncertainty about the nature of the report is resolved (indicated by the shaded circle) and we get either good news or bad news. We then face another decision – whether to drill or not. However, our assessment of the likelihood of success is different with good news than with bad news, hence, we may make a different decision with good news than with bad news. That potential to change our decision regarding whether or not to proceed with the project is what gives additional information value. Information that is too imprecise to make a difference in our decision has less value. The more closely the new information allows us to estimate the true underlying state of nature, the more valuable it is.

Don’t Proceed

Don’t Buy

Proceed

0 Pr(S)

Pr(F)

Pr(GN)

Don’t Proceed

Good News

Proceed

Buy Pr(BN)

Proceed

πF 0

Pr(S|GN)

πS

Pr(F|GN)

πF

Don’t Proceed

Bad News

πS

0 Pr(S|BN) Pr(F|BN)

πS πF

So, how do we go about determining whether we should buy the report and its value? We need to calculate expected profit conditional on purchasing the additional information, E (π | Info) . Assuming we perform that calculation ignoring the cost of attaining the additional information, the value of the report is simply the increase in expected profit, and we should buy the report it its value exceeds its cost. How, in turn, do we determine expected profit conditional on having purchased the information? In this sort of problem, we will usually start at the end and work our way back toward the beginning. That means we must calculate what expected profit would be with good news, E (π | GN ) , and with bad news E (π | BN ) . Each of those is calculated just like we calculated expected profit with no additional information, except that we have different probabilities, depending on the information we received. Expected profit with good news is then just the probability of good news times expected profit conditional on good news plus the probability of bad news times the expected profit conditional on bad news. Whether we receive good news or bad news, we will proceed only if expected profit based on our new estimate of the probability is positive, otherwise we will not and profit will be 0. Therefore, expected profit given good news is

E(π | GN) = Max ((Pr(S | GN)π S + Pr(F | GN)π F ),0).

(1.9)

It is important to note that if expected profit given good news is negative, we will not proceed even with good news and the highest possible estimate of the probability of success. We therefore would not proceed with no information or with

a report of bad news, either. Since the information can not affect our decision, it has no value in that circumstance. Example: Expected Profit with Good News

Suppose the probability of a successful project given a good report (good news) is ¾, and that success will result in a cash flow of $100, while a failure will result in a loss of $60. Find the expected profit given a good report. Solution: Use the information given (observing that the probability of failure given a good report is 1‐ ¾ = ¼) and the definition of expected profit given good news. E(π | GN) = Max ((Pr(S | GN)π S + Pr(F | GN)π F ),0)

(

)

E(π | GN) = Max 3 (100) + 1 (−60),0 4 4 E(π | GN) = Max (60,0) = 60

Because expected profit with good news is positive, we would want to continue with the project if we receive good news. If this value had been negative, we could immediately conclude that the report would be useless. Moving on, expected profit given bad news is

E(π | BN) = Max ((Pr(S | BN)π S + Pr(F | BN)π F ),0).

(1.10)

If expected profit given bad news is negative, we will abandon the project, getting a payoff of 0; if it’s positive, we will proceed. A similar conclusion to the one above is that if our expected profit given bad news is positive and we proceed with the project even with the worst possible news about the chances of success, the information has no value. This is because if we were to buy a report and it came up unfavorable, we would still proceed with the project. That means we will of course proceed with no information or with good news. Since the report would have no effect on our final decision in such a case, it would have no value. Example: Expected Profit with Bad News

Suppose that based on previous reports, a firm estimates that the probability of a successful project given a bad report (bad news) is 1/3, and that success will result in a cash flow of $100, while a failure will result in a loss of $60. Find the expected profit given a bad report. Solution: Use the information given (observing that the probability of failure give a bad report is 1 – 1/3 = 2/3) and the definition of expected profit given bad news.

E(π | BN) = Max ((Pr(S | BN)π S + Pr(F | BN)π F ),0)

( (

)

E(π | BN) = Max 1 (100) + 2 (−60),0 3 3 E(π | BN) = Max −20 ,0 = 0 3

)

Since we would not proceed after receiving bad news, the information may have some value, and likewise we can continue with our valuation. If this number had been positive, the report would have been worthless. Now that we know how to calculate both expected profit given good news and expected profit given bad news, we can calculate the expected profit after buying the additional information. There are two possible outcomes for the report: good news or bad news. Each outcome has an expected profit associated with it ‐ the two expected profits we just defined. So, expected profit with additional information is

E (π | Info) = Pr(GN ) E (π | GN ) + Pr(BN ) E (π | BN )

where Pr(GN) is the probability the report gives good news, and Pr(BN) is the probability the report gives bad news. We know from our above discussion that for the information to be valuable E(π|BN) must be 0, so the second half of the equation falls out. Assuming the conditions outlined above hold, so that the report is in fact valuable, expected profit with information then becomes

E(π | Info) = Pr(GN)E (π | GN) .

The maximum amount that the firm would be willing to pay for the information is simply how much higher the firm expects profits to be with the information. Thus, the value of information is

InfoValue = E (π | Info) − E (π | NoInfo) .

As long as the report costs less than this amount, the firm will buy it. (The firm is exactly indifferent if the cost of the information equals its value). Example: Expected Profit with Information and Information Value (Continuation)

Suppose, based on previous reports, a firm estimates that the probability of receiving good news if they buy a report is 0.4. Using the previously calculated expected profits with good news and bad news, calculate the firm’s expected profit with additional information. Based on the previous examples, how valuable is a report to the firm? Solution: Use the definition of expected profit given information, then find the difference between the firm’s expected profits with and without the information.

E (π | Info) = Pr(GN ) E (π | GN ) E (π | Info) = 0.4(60) = 24

InfoValue = E(π | Info) − E(π | NoInfo) InfoValue = 24 − 20 = 4

Our running example is illustrated in full in the decision tree below. Don’t

Don’t Buy

Proceed (1/2)100‐(1/2)60=20

Value=24‐20=4

Proceed

2/5

(3/4)100‐(1/4)60=60 3/5

1/2

Proceed (1/3)100‐(2/3)60=‐20/3

100 ‐60 0

3/4 1/4

Don’t

Bad News (2/5)(60)+(3/5)(0)=24

1/2

Don’t

Good News Buy

100 ‐60 0

1/3 2/3

100 ‐60

What’s to Come Math is an incredibly useful modeling tool in economics. It allows us to give clear and concise expression to the most important and fundamental concepts. In addition, as is probably obvious from the material in the first chapter, we will make extensive use of mathematical examples to illustrate the main concepts and to practice using the tools of economic analysis. Therefore, the appendix to this chapter provides a reasonably thorough and self contained math review. However, if you are really rusty on your math, you may need to pull out the old textbooks for a more in depth review. The remainder of the book will build on the material introduced in this chapter to analyze decisions firms and their managers must make when trying to maximize the expected present value of profit. The figure below is a conceptual depiction of the flows of inputs, goods and services, and funds and of the interactions between the firm, suppliers, customers, and competitors that govern the firm’s profitability.

Labor Materials Capital Income

Customers

Inputs

Products Expenditures

Input Markets Costs

Product Markets Inputs

Products Firm Competitors

Revenue

Chapters 2‐14, comprising the remainder of Part 1 of the text, cover the basic tools of economic analysis and consider numerous applications of these tools to business problems. Chapter 2 takes a broad look at the cost side of the profit equation, determined by the prices of inputs and how efficiently they are used, and the revenue side, determined by the demand from customers. It then sets forth and examines the basic rule for maximizing profit: if doing more of something increases profit, do it more, and if doing more of that thing decreases profit, do it less. Of course that sounds ridiculously simple, but really understanding how to apply that basic insight in many different situations and understanding its ramifications for the ways in which firms, suppliers, and customers interact is the heart of microeconomic analysis. So, for example, if the additional revenue (marginal revenue) from selling another unit exceeds (falls short of) the additional cost (marginal cost) of making another unit, produce and sell more (less) to increase profit. At the maximum profit, it follows that marginal revenue is just equal to marginal cost. If that basic idea sounds mathematical, it is. In calculus, something is maximized or minimized by setting its derivative with respect to things under our control equal to 0. In this case, the derivative of profit with respect to how much is sold is 0. (So, you really do need to be comfortable with the review material in the appendix.) Chapter 3 extends the basic principles of profit maximization to consider a number of more advanced applications. While chapters 2 and 3 express practical or applied concepts and work numerical examples for given demand and cost conditions, it does not say how to go about getting empirical approximations of things like demand curves, assessing the degree of accuracy in those approximations, or understanding the ways in which those approximations can go wrong. Those are the topics of chapters 4, 5, and 6, respectively.

Chapter 7 takes a step back to consider the basic theory of individual preference and consumer behavior. That theory underpins demand analysis and lends itself to a number of other applied analyses. Chapter 8 considers some of those applications and extensions. Chapter 9 considers more complex forms of pricing – block pricing, two part pricing, and menu pricing. Chapter 10 presents the basic model of individual choice when faced with uncertainty and considers the ramifications of risk aversion. Chapter 11 considers cost in detail. Game theory, the last tool needed to complete our analytical toolbox, is the topic of chapters 12, 13, and 14. We will use game theory to model multiple firms simultaneously seeking to maximize their own profits, given their best guesses about what all the other firms are going to do, recognizing that all the other firms are doing the same thing and that every firm’s decision affects all the others. More generally, non‐cooperative game theory is used to analyze any strategic situation where the players all realize that their best play depends on what all the other players are going to do. Part 2 of the book uses the tools developed in Part 1 to study market structure – the ways the environment in which firms operate effect the decisions available to them, and, the way the decisions of firms interacting in a market effect other firms and the market. Chapter 15 analyzes markets in which relatively large firms all produce completely identical products. In such markets, gaining a strategic advantage over competitors boils down to taking market share through having a leaner cost structure or establishing an aggressive strategy early on, since nothing but price can distinguish one product from the next. Chapter 16 considers markets where products are differentiated. While cost structure still matters, product positioning and advertising also become important elements of firm strategy. Chapter 17 considers product markets where economies of scale are low and there are few other entry barriers, so that the market can support a large number of relatively small firms. In the limiting case, such markets are perfectly competitive, which you should be familiar with from your principles of microeconomics class. When there are enough firms in a market for the perfectly competitive model to be a reasonably good approximation for whatever question an analyst wishes to answer, its simplicity is a tremendous advantage. In particular, complications arising due to strategic interdependence can be ignored. In that case, equilibrium prices and quantities and the welfare of consumers and producers may be appropriately analyzed with the simple supply and demand model. Applications of supply and demand are the topic of chapter 18. Chapter 19 closes out the study of market structure by summarizing the various models and placing them in the context of data on various U.S. industries. It also considers several specific strategies firms may pursue that are aimed at altering the nature of competition and the structure of the market itself. While something called a “firm” has played an important role in most of the first eighteen chapters of the book, we have not looked at the reasons firms exist, or, why they are structured as they are. Why don’t individuals simply specialize in the single thing they do best and then trade with one another? How does the existence and

structure of the firm, in its own right, increase value added? That is the subject of Part 3 of the book. In order to produce goods and services to sell to their customers, firms must procure inputs. Those inputs may be purchased in the spot market, the firm may contract with input suppliers, or, the firm might vertically integrate and produce some of the intermediate inputs themselves. Chapter 20 considers the choice between the spot market and contracting. It then moves on to consider issues that arrive in contracting for inputs when the agent (the input provider) has private information that is not available to the principal (the firm). Lest I give the wrong impression, Chapter 20 barely scratches the surface of the economics of asymmetric information, but it does give the basic insights. Chapter 21 considers three problems other than information asymmetry that can cause the spot market or contracting to break down, creating a role for a vertically integrated firm. These are team production and the resulting free riding problem, relationship specific investments and the resulting hold up problem, and, double marginalization.

Uses and limits of models Before concluding this introductory chapter, several words of caution are in order regarding the use of models in this book. It is important that we keep in mind that they are only models. By definition, models are oversimplified representations of a few aspects of reality. They allow us to focus on a few aspects of a problem that we think are particularly important. In that way, they keep the analytical task manageable. In making any decision we must consider multiple models and carefully consider if anything has been left out of the models that might have serious consequences for our decision. As noted by MIT economist Peter Diamond, “To me, taking a model literally is not taking the model seriously.”2 This can happen in two ways. Some people who forget the models are intended to be a simplified story, not a precise description of reality, reject all conclusions of economic analysis on the grounds that the models are unrealistic, never mind the fact that it would be impossible to draw any conclusions from, or even to construct, a fully realistic model. That is, they throw the baby out with the bathwater. Others try to force their actual decisions and opinions to conform narrowly to the results of their pet model, and ignore the potential serious ramifications of things that lie outside their models. This can lead to catastrophe when rare but serious events occur that are not accounted for in the model. For example, LongTerm Capital Management collapsed in the late 1990s not because they calculated incorrectly in applying models such as the Black‐Scholes‐Merton option pricing model. Rather, the collapse was due to the impact of rare but serious negative external shocks outside of the model coupled with the highly leveraged positions they took as a consequence of the conclusions reached from their models.3 Consider the assumption that we can treat firms as if they maximize expected returns and ignore risk aversion (at least as a first approximation). It is useful because it simplifies matters incredibly, with benefits. First, it lets us get started

2 “Taxes and Pensions,” Southern Economic Journal 2009, 76(1), page 2. 3 http://en.wikipedia.org/wiki/Long‐Term_Capital_Management

while we are developing the tools needed to study risk aversion. Second, and more importantly, it allows us to go in to more detail in our analysis of other issues where the central area of concern is not the degree of risk aversion, but the nature of profit maximizing decisions and the direct impact of uncertainty on those decisions. Without this assumption, we would never be able to gain a number of insights about profit maximizing decisions. The drawback is that some can become so wrapped up in the model that they forget it is just a model. Individual managers and board members with large stakes in the company are, in fact, probably risk averse. Further, focusing on the expected profits of an individual project within a firm can miss important links to the big picture. If failure of a $50 million project bankrupts a firm with expected profits from other projects of $500 billion, the project may not be worth it even if it looks good evaluated on its own. We should always take account of the things left out of our models – like the potential to bankrupt the whole firm – before reaching any final decisions.

Chapter 1 Terminology The following is a list terms that you should know in order to discuss and apply the material from this chapter. Adverse Selection A case of information asymmetry in which one party’s characteristics are hidden from another party. Asymmetric Information A state in which one party knows more than others. Certainty Equivalent (CE) The single amount of wealth of a gamble that an individual receives for certain that provides the same amount of utility that the actual gamble offers. For risk neutral players, the certainty equivalent is equal to the expected value of the gamble. For risk‐averse players, it is less than the expected value of the gamble. Expected Value The monetary value that one expects to receive from a particular gamble. This can be applied to anything that has probabilities associated with payoffs. Incomplete Information A state in which there is a risk or uncertainty. Marginal Cost The cost incurred when a firm sells one additional unit of a good. It is the rate of change (or derivative with respect to quantity) of total cost. It is increasing because of the presence of a fixed factor. In order to maximize profit, this should be set equal to marginal revenue. Marginal Revenue The additional income of selling one more unit. It is the rate of change (or derivative with respect to quantity) of total revenue. In order to maximize profit, this should be set equal to marginal cost. Moral Hazard A case of information asymmetry in which one party’s actions are hidden from another party. Perfect Competition A type of firm structure with many firms where each individual firm has no control over price levels; they are price takers. These firms have no strategic decisions to make, since they take their price from the market. Nothing they do will have any impact on the other firms in the market. Perfect Information A state in which no hidden characteristics or actions exist. The information the buyers have is the same as that of the seller and everyone knows all information about the product at all times. Risk Aversion An attitude toward risk that describes someone who values a gamble less than the expected value of that gamble. A risk‐averse person would take a guaranteed payoff that is less than the expected value of the gamble rather than taking the gamble with the possibility of getting nothing/losing money. Risk Neutrality An attitude toward risk that describes someone who values a gamble at its expected value. Risk Occurs when the probabilities that certain events will occur are somewhat objective, i.e. probabilities are known to all parties.

RiskLoving Describes someone who must receive a guaranteed payoff that is more than the expected value of a gamble in order to not take the gamble. Uncertainty Occurs when the probabilities that certain events will occur are subjective, i.e. probabilities are different depending on the party.

Appendix to Chapter 1 Math Used in Managerial Economics Introduction As mentioned in Chapter 1, economics attempts to develop systematic methods for modeling events and decisions concerning individuals and organizations. Because economics deals with people, it is by nature an inexact science; people are irrational, whimsical, and often unquantifiable. In order to draw any useful conclusions about these individuals, it is important that our methods be as reliable as possible. Applied mathematics is the best way we have for being consistent, accurate, and logical in developing economic theory because it is a closed system in which we are unable to contradict ourselves. So, if we equip ourselves with relevant mathematical knowledge, we will have access to useful tools that we can apply to various economic situations. This also enables us to check our economic theory against situations and data that we observe in the real world. This review is intended to address the most common mathematical concepts that students will encounter throughout the book. Any lack of understanding of these basic mathematical concepts will seriously impede the student’s ability to move forward with the economics. We cannot stress this enough: simple observation of students’ struggles in learning managerial economics highlights the importance of this material. While some of it may seem very basic, it is critical that both method and theory are fully grasped. So, please read it!

Functions and Functional Notation The most basic mathematical concept that can help us formulate applied economic theory is a function. A function matches each element of one set to a single element of another set. The sets that we most commonly talk about are variables like price and quantity; they are just groups of several elements. Elements of the set price are simply different prices that a firm could charge, such as $2 or $5.50. Similarly, elements of the set quantity are different amounts of a good that a firm could sell, such as 100 boxes or 22.5 widgets. A function, then, is really nothing more than a relationship between two sets, such as price and quantity. There’s some additional terminology that will help us describe functions in more detail. First, let’s look at a basic function, and then use it to introduce these additional details. The following function shows a relationship between quantity q and price p:

q = 4000 − 100 p What does this function tell us about this particular relationship between price and quantity? First, we are able to find solutions to this function. A solution to a function is just a collection of values (elements) that each variable (set) takes on which make

the function true. So, if we let price have a value of 1, we can determine what quantity would have to be to satisfy the function:

q = 4000 − 100(1)

q = 3900 . Thus, if price is $1, our function tells us that 3900 units will be sold. This is essentially what a function does ‐ it describes the relationship between variables over all their possible values. We just found out what the quantity would be if we chose a price of $1, but if we wanted to find out what price would be if we started by choosing a quantity of 3900, the math would be a little bit more difficult. This is because our function was solved for q – and there is a reason for this. When a function is solved for a single variable that occurs only on the left‐hand side, that variable is known as the dependent variable (quantity, in our example). As the name suggests, the dependent variable in a function depends on the other variable(s) in the function. Hence, our function implies that the quantity sold actually depends on the price that is being charged. This is why it seems more natural to choose a price first, and then solve for the resulting quantity. The variable that determines the dependent variable (price, in our example) is known as the independent variable. Again, as the name suggests, this variable is chosen independently of the function, and helps determine what value the dependent variable will take on. So, a function expresses a relationship between a dependent variable and an independent variable. In order to save us the trouble of denoting which variable is dependent and which is independent every time we introduce a function, functional notation expresses these relationships in a very general way. In the above example, q was the dependent variable and p was the independent variable – so we would say that q is a function of p. In this way, we have stated there is a relationship between the two variables, and that since q is a function of (depends on) p, we have also identified which variable is independent and which is dependent. While the statement “q is a function of p” is rather terse, there is an even shorter, more convenient way to represent this relationship: q = f ( p) . This notation is equivalent to saying that quantity is a function of price, so again we’ve communicated all the essential information about the variables. More often, rather than naming the function f (⋅) , we will write something like q = q( p) , which means (again) that the variable q is determined by a function named q, which depends on the variable p. So, using functional notation, our previous example can be written as

q( p) = 4000 − 100 p and can be referenced to in a question simply as q(p) (read q of p). This helps to cut down on the number of letters and symbols we need to remember. So far, we’ve been talking about functions with one dependent variable and one independent variable. If we think about what affects the sales of a firm, price is certainly dominant, but, as is always the case in economics, it is certain that other

variables also impact quantity demanded. Suppose a firm assumed that income per capita in the market also impacted quantity sold and they wanted to model this additional relationship. The firm is supposing that quantity depends on income, in addition to price. So, income (M) is an additional independent variable. Perhaps the function will look like the following:

q = 4000 − 100 p + 50M . Thus, if a function has more than two variables, only one will be the dependent variable, and the others will be the independent variables. The notational difference is intuitive: here, quantity is a function of price and income, or q = q( p, M ) . Finally, we introduce notation for dealing with multiple prices or quantities. Suppose a company sells its product in two different states, or sets multiple prices for different customers, and wants to differentiate between these groups. In order to identify which price and quantity are associated with which group, we use subscripts on our variables. So, q1 may denote the quantity sold in Florida, while q2 denotes the quantity sold in New York; or pH could represent the price charged during high‐demand hours, and pL the price charged during low‐demand hours. In the text, subscripts will always be used to differentiate variables according to location, time, or some other dimension, while superscripts will be reserved for exponents of variables.

Equation of a Line Slope One of the most common mathematical tools we will use throughout the book is a line and its equation. Not only does the equation of a line lend itself to rigorous quantitative application, the graphical representation is able to communicate economic relationships between variables clearly and effectively. It is crucial to understand the basic components of a line, so let’s start with a simple example using a generic x/y coordinate plane.

The first thing we notice about this line is that it is sloping down; that is, as we start from where the line crosses the Y Axis at 8 and increase the x value (move right), the y value decreases (moves down). When a line looks like this, its slope is said to be negative. The slope of a line immediately tells us a lot about the line.

Without knowing anything else, we can conclude that there is an inverse relationship between the y variable and the x variable: as one increases, the other decreases, and vice versa. This type of relationship will have several implications when we begin to apply lines to economic theory. Of course, there are many lines that have negative slopes. For example,

both lines L1 and L2 have negative slopes. How can we differentiate between these two lines? The slope of a line is actually a specific number that we can calculate. The definition of the slope of a line, which we will usually refer to generically as b, is the change in the dependent variable per unit change in the independent variable. That may sound like a complex definition at first, but think back to what dependent and independent variables are – they are simply names for the variables that we are describing a relationship about. Since a line is just a relationship between two variables, the x and y variables are the independent and dependent variables. But which one is which? By convention, the variable on the Y Axis is the dependent variable, and the variable on the X Axis is the independent variable. This will be true when we start introducing different economic variables, but for now, since we’re using a simple example, y is our dependent variable and x is our independent variable. Now that we know which variable is independent and which is dependent, let’s revisit the definition of slope. The slope of a line, again, is the change in the dependent variable per unit change in the independent variable. Since we’ve identified our variables, we can rewrite this definition: The slope of a line is the change in y per unit change in x. This definition seems much more manageable. In order to translate it into a mathematical expression, first we need to understand exactly what it’s saying. If we had $10 to spend and we bought 2 apples, how would we find out how much we spent per apple? We would divide the $10 by the 2 apples and find that we spent $5 per apple. Similarly, to find the change in y per unit change in x, we want to divide the change in y by the change in x. By dividing, we will have a number that represents how much y changes each time x increases by a single unit. How do we find the change in x and the change in y on a graph? The change in a variable is just the difference in its beginning and ending values over some interval. So, the change in x is x1 minus x0, where x0 is the starting x‐value and x1 is the ending

x‐value. Likewise, the change in y over some interval of a line is y1 minus y0. Note that the change in a variable can either be negative or positive, depending on the beginning and ending points of the interval. Since we know how to mathematically represent changes in our variables, we can rewrite the definition of slope again. The slope of a line (b) is the change in y per unit change in x, or

y1 − y0 . x1 − x0

(1.1)

Remember, the two values for each variable that we use to find the change are over an interval of the line; when calculating slope, we need to use the same interval when finding the change for each variable. We also will often use the notation Δ (read delta) to denote “change in”, which means we can write slope in an even simpler way:

Δy . Δx

(1.2)

Now, let’s revisit our first line and calculate its slope.

To make the calculation easy, let’s use the entire line as our interval. We need to calculate the change in y over the entire line and divide it by the change in x.

Starting at point a, the beginning y value (y0) is 8 and at point b, the ending y value (y1) is 0. Similarly, x0 is 0 and x1 is 4. This means the slope is

Δy 0 − 8 = = −2 . Δx 4 − 0

Let’s apply this number to our original definition of slope. The slope is the change in y per unit change in x. So each time x increases by one, y decreases by two. This is what the slope tells us. To see how this applies to economics, lets change the variables in this example to price and quantity. Usually, we write price on the Y Axis and quantity on the X Axis. So, price is assumed to be our dependent variable, and quantity our independent variable.

Because the slope is ‐2, we know if quantity were to increase by one unit, price would have to decrease by two units. If this line represented a demand function, a manager would know that in order to sell another unit, he would have to lower his price by $2. This is a simple example of how lines can be applied to economic theory.

Equation Once we have calculated the slope of a line, writing an equation for the line is easy. The slope intercept form of a line is

y = a + bx ,

(1.3)

where a is the value of y when x equals 0 (i.e. the y‐intercept), and b is the slope. Let’s write the equation of our first line.

It is clear that the y‐intercept is 8 because when x is 0, y takes a value of 8. Since we know the slope of our line is ‐2, the equation for our line is

y = 8 − 2x . If we wanted, we could solve this line for x, obtaining x = 4 − 0.5y . It is important to understand that both of these equations represent exactly the same line and thus the same specific relationship between the variables y and x. However, since x is now on the left‐hand side, we may be tempted to re‐label which variable is independent and which is dependent. We will encounter situations similar to this often throughout the book. For example, whether price determines quantity or quantity determines price depends on the type of firm, the market, and many other factors. More often than not, they both determine each other. Because of this intrinsic quality of the marketplace, variables may not have an exact category, and will frequently be labeled independent in one scenario and dependent in the next.

Solving Two Linear Equations Often we will want to calculate the intersection of two lines. Any two distinct, non‐parallel lines cross at a single point:

This point (or solution) is an ordered pair of values that satisfy both equations. There are several ways to find the solution, but one of the simplest and most consistent is to use substitution. Given two equations, y = f (x) and y = g(x) , their intersection (x*, y*) can be found as follows: i) Set f (x) = g(x) and solve for x*. ii) Substitute x* into either y = f (x) or y = g(x) and solve for y*. To illustrate using the method of substitution to solve two linear equations simultaneously, let’s find where the equations y = 8 − 2x and x = −2 + y intersect. First, we need to get the second equation in the form y = g(x) by solving for y :

x = −2 + y → y = 2 + x . Now, as per the above points, we can set 8 − 2x = 2 + x and solve for x*:

8 − 2x = 2 + x

6 = 3x x* = 2 To find y*, substitute x* into either equation. Let’s use the first one:

y = 8 − 2(2)

y* = 4 Therefore, (2,4) is the solution for the system of equations y = 8 − 2x and x = −2 + y .

Area of a Rectangle and a Triangle We will frequently come across geometric shapes in our graphs that represent areas of interest. For example, the rectangle in the following graph represents a firm’s profit:

To calculate how much profit this firm is making, we need to calculate the area of this rectangle. The area of a rectangle is

A = bh ,

(1.4)

where b is the base of the rectangle and h is the height. In our example, the base is q and the height is ( p − c) , so the firm’s total profit is π = ( p − c)q .

Another common geometric shape that we will encounter in economics is a triangle. The shaded triangle below shows the deadweight loss in a market as a result of a tax:

Since a right‐hand triangle is just a square cut in half along its diagonal, the area of a triangle is

bh , 2

(1.5)

where b is the base of the triangle and h is the height. So, the total deadweight loss (q − q1 )(( p + t) − p ) . for this market is DWL = 0 2

Exponents, Exponential Functions, and Logarithms When taking account of a production facility’s returns to scale, or making assumptions about the elasticity of a market or compound growth, exponents, exponential functions, and logarithms can be very useful. Thus, it is important to know all of the transformational rules concerning both of these concepts as they will show up throughout the book. Lets start with a basic power function where our independent variable, x, is raised to some exponent, a. What is xa? a is just the number of times you multiply 1 by x. x1 is the same as 1x, x2 is 1xx, and, so on. x0 means multiply 1 by 0 x’s, so x0=1. What is x‐a? It is the opposite of multiplying 1 by x a times. Dividing is the opposite of multiplication, so a is the number of times you divide 1 by x. x‐1 is 1/x, x‐2 is 1/xx, and, so on. All of the standard “rules” for exponents just follow from this definition. i)

x a+b = x a x b

ii) (x a )b = x ab iii) x − a =

1 xa

Now that we understand how the basic power function and exponents work, we note that the power function can be generalized to

f ( x) = kxb ,

(1.6)

where k and b are just constants. The constant k in the general power function just represents a scaling factor. The general exponential function moves the independent variable into the exponent of the function and is written as

f ( x) = aebx ,

(1.7)

where a and b are constants and e is approximately 2.71828. But, what is e, and what is special about it? It is the base unit of continual growth processes. Imagine a process that doubles itself every time period so that the interest rate is r=1. At the end of one period, the future value of 1 unit growing for 1 period is FV=(1+1)1=2. Of course, it is doubling! Now imagine compounding that 100% per period growth n times over the n

⎛ 1⎞ period. Then FV = ⎜1 + ⎟ . Suppose we compound monthly. FV=2.61304. Daily? ⎝ n⎠ FV=2.71456. Hourly? FV=2.71812. By the minute? 2.71828. With continuous growth, each little bit of growth starts growing as soon as it emerges. In the limit, as we approach continuous growth, the FV of one unit growing continuously for one period of time is e. If we let it grow for 2 periods, we have FV=ee=e2. For x periods, ex. Suppose it grows for one period at r=2. We could think of it as 100% growth occurring twice, so FV=ee=e2. If it grows for one period at rate x, we have FV=ex. Generally, then, if an initial unit grows continuously at rate r for t periods, FV=ert. If the initial amount is PV, instead of 1, we just have FV=PVert. Thus, the general exponential function gives the future amount, y, that started as an initial amount, a, and then grew exponentially, or in a compound fashion, at rate b for each unit increase in x, so y = aebx . So, e has tons of applications with any natural growth process or for modeling any variable that is affected in an exponential or compound way by another variable. As it happens, we will not use it a lot in this class, but, we will use it a time or two. More importantly, you can’t understand natural logs without e, and we will use natural logs often. So, what is a natural log? The natural logarithm of y, ln(y), is the power to which e ≈ 2.71828 must be raised to yield y. So, if x = ln( y ) , then y = e x . So, the natural log undoes exponentiation, that is, it is the inverse of the exponential function. Fine, but, intuitively, what is the natural log? Since ex is the amount into which one unit grows after growing continuously for one period at rate x, x periods at rate r=1, or, t periods at rate r where rt=x, ln(y) is the combination of growth rate and growth time, rt, needed for one unit to grow continuously into y units. More generally, if y increases from an initial amount, a, at an exponential rate of b with increases in x, ln(y)/b gives the value of x needed for the initial amount to grow to y units.

The following properties of natural logarithms follow directly from the definition of the natural log and from the basic rules for exponents given above. iv) ln(xy) = ln(x) + ln(y)

⎛ x⎞ v) ln ⎜ ⎟ = ln(x) − ln(y) ⎝ y⎠ vi) ln(x a ) = a ln(x)

Solving Two NonLinear Equations Any function whose output is not proportional to its input is called a non‐linear equation. All of the functions we have looked at so far have been linear, but whenever a logarithm or exponent (other than 1 or 0) is present and interacting with one of the variables, the equation becomes non‐linear. Non‐linear equations, like linear equations, can represent demand curves, cost functions, and many other economic concepts. So, it will often be desirable to solve two non‐linear equations simultaneously, just as it was with linear equations. The process for solving a system of non‐linear equations is similar to solving a system of linear equations, except solving for the final answer is often more tedious. To demonstrate, let’s solve the following system of non‐linear equations:

2x + y 2 − y = 4 y 2 − 3 = x . In our previous section dealing with linear equations, we followed a general rule of solving each equation as a function of x alone, and then setting these two functions equal to each other. We could do that here, but looking at the second equation, we can see that it’s already solved for x as a function of y alone. This means we can substitute it in for x in our first equation, and solve the remaining equation for y:

2(y 2 − 3) + y 2 − y = 4 2y 2 − 6 + y 2 − y = 4 3y 2 − y − 10 = 0 (3y + 5)(y − 2) = 0

3y + 5 = 0 or y − 2 = 0 y = −5 3 or y = 2 . We can now plug these two solutions into either of the original equations to find the x values:

( 3) − 3 = x −5

(2 )2 − 3 = x

25 − 3 = x or 9 x = −2 9

4 − 3 = x . x =1

{(

) }

So our solution set, which consists of two ordered pairs, is −2 9 , −5 3 ,(1,2) . Seldom, we will end up with a quadratic equation that does not factor as neatly as it has done here. In that case, it may be necessary to use the quadratic equation. For an equation of the form ax 2 + bx + c = 0 , the quadratic equation tells us the solution(s) are

−b ± b 2 − 4ac . 2a

(1.8)

To illustrate how logarithmic transformations may be necessary, let’s solve the following system:

ln(q) = 3 + 2 ln( p) ln(q) = 4.5 + 1.6 ln( p) Observing that both equations are already solved for ln(q), we can set the right side of each equation equal to each other:

3 + 2 ln( p) = 4.5 + 1.6 ln( p) .4 ln( p) = 1.5

ln( p) = 3.75 Using rule iv from the list of rules on exponents and logarithms, this simplifies to

p = e3.75 p ≈ 42.52

To find q, plug p into either original equation:

ln(q) = 3 + 2 ln(42.52) ln(q) ≈ 3 + 7.5 ln(q) ≈ 10.5 q ≈ e10.5 q ≈ 36, 315.5

Definition of Derivative, Relationship to Max/Min Definition When dealing with linear equations (lines), we introduced the concept of slope, and were able to calculate this value with relative ease. The slope of a line revealed information about the rate at which the two variables changed – for example, when x increased by 1, y decreased by 2. This was also true for every interval on the line. How do rates of change apply to non‐linear equations? Above, we defined a non‐ linear equation as a function whose output is not proportional to its input; graphically, this amounts to any curve that is not a line. Take the following function f (x) :

It is clear that this function does not have a constant slope, as lines do. The slope changes based on what part of the function we’re looking at. This is why discussing rates of change as they apply to non‐linear functions requires a more sophisticated concept: the derivative. Before defining what a derivative is, let’s take another look at our function. Suppose we wanted to find the average rate of change between two points, a and b:

Between these two points, y increases by y1 − y0 ( Δy , the change in y) and x increases by x1 − x0 ( Δx ). So, the average rate of change is simply

Δy y1 − y0 = . Δx x1 − x0

This is merely the slope of the thick dotted line between points a and b. Now, if we rewrite y1 as y0 + Δy , we can express the average rate of change as

Δy (y0 + Δy )− y0 = . Δx Δx Using functional notation, this becomes

Δy f (x0 + Δx )− f (x0 ) = . Δx Δx But how can we use this to find what the rate of change is at each individual point? Suppose we moved x1 closer to x0, decreasing Δx . If we continue shrinking Δx until it is infinitesimal, this is what our average rate of change would look like between the two points:

The line segment between the two points comes closer and closer to being the line that is tangent to the curve at point a. The slope of this line segment, then, converges to the slope of the tangent at a as Δx approaches zero. This leads to the formal dy definition of a derivative, denoted , as the limit of the average rate of change as dx the change in the independent variable approaches 0:

f ( x0 + Δx) − f ( x0 ) dy = lim Δ x → 0 dx Δx

(1.9)

In essence, the derivative of a function at a point is the rate of change of y with respect to small changes in x; it captures how fast the curve is changing at that point. Since the derivative is the slope of the tangent, it is clear that for any non‐linear function, the derivative will change based on where it is being taken:

In fact, the converse of this is true as well: given any linear function, its derivative will be constant along the entire function. This is because the derivative of a linear function is simply its slope.

Max/Min The derivative of a function can also help us identify when we are at a “peak” or a “valley”; that is, when a function is being maximized or minimized. Consider the following function:

Since the derivative is the slope of the tangent at a specific point on a curve, and the slope of a horizontal line is zero, it is clear that if the derivative is zero for a given critical point, that point will be a maximum of the curve. What if a function has a local minimum, in addition to a local maximum?

We can see from the above figure that the derivative will also be zero at local minimums. How can we tell whether we’re at maximum or a minimum? First, we know if a given x value is either a maximum or a minimum by checking to see if the first derivative at that value is zero; this is known as the First Order Condition (FOC). If the FOC holds, we can check to see whether the point is a minimum or a maximum by looking at the curvature of the function at that point – which is given to us by the second derivative. If the second derivative is negative, x is a local maximum, and if the second derivative is positive, x is a local minimum. This is known as the Second Order Condition (SOC), and by it we can tell whether we are maximizing a function or minimizing it.

Derivative Rules We’ve introduced the concept of a derivative and how it relates to local maximums and minimums of a function. But how do we actually calculate the derivative of a function? Depending on the given function, the rules required to produce that function’s derivative vary. In this section, we discuss some common rules for derivatives and how they apply to general cases. Far and away, we will make the most use of the power rule: i. y = ax b ⇒

dy = bax b−1 dx

For example, the derivative of

y = 4 x 3

dy = 12 x 2 dx Three common special cases are: ii. y = a ⇒

dy = 0 dx

iii. y = a + bx ⇒ iv. y =

dy =b dx

a dy −ba ⇒ = b x dx x b+1

Based on these rules, the derivative of

y = 3+

2 x 0.5

dy −1 = . dx x1.5 We will use the sum rule and the product rule often. The sum rule is v. y = f (x) + g(x) ⇒

dy = f '(x) + g '(x) , dx

which says that the rate of change in y is the rate of change in f(x) plus the rate of change in g(x). So, the derivative of

y = 3x + 4x 2 is

dy = 3 + 8x . dx The product rule is vi. y = f (x)g(x) ⇒

dy = f '(x)g(x) + g'(x) f (x) , dx

which says that when finding the rate of change in y, the rate of change in f(x) gets multiplied by g(x) since y depends on the product of f(x) and g(x). Similarly, the impact of changes in g(x) are multiplied by f(x). The total rate of change, then, is the sum of the rates of change due to each of these parts. Thus, the derivative of

(

)

y = 9 − x 2 2x is

dy = −2x(2x) + 9 − x 2 2 dx = −4x 2 + 18 − 2x 2 .

(

)

= −6x 2 + 18 A special case is vii. y =

f (x) dy f '(x)g(x) − f (x)g'(x) −1 = f (x) (g(x)) ⇒ = , g(x) dx g(x)2

so the derivative of

2x 2 y= (2x − 1) is

dy 4x(2x − 1) − 2x 2 (2) = . dx (2x − 1)2

Less often, we will use exponential and logarithmic functions: viii. y = a ln(x) ⇒ ix. y = aebx ⇒

dy a = dx x

dy = baebx dx

So, the derivative of

y = 2 ln(x) is

dy 2 = dx x and the derivative of

y = 3e2 x is

dy = 6e2 x . dx What if one variable depends on another that is a function of a third variable? For example, if cost depends on quantity, but quantity depends on price. For this, we need the chain rule. If z = g(y) and y = f (x) , x. z = g ( f (x)) ⇒

dy = g ' ( f (x)) f '(x) dx

which says that the rate of change in z with respect to x is the rate of change in z with respect to y, times the rate of change in y with respect to x. For example, the derivative of

y = (4 − 0.5x)2 is, using first the power rule and second the chain rule,

dy = 2(4 − 0.5x)(0.5) . dx

Partial Derivatives Up to this point, we have talked about differentiation in the context of two variables, one dependent and one independent. To measure the rate of change of the dependent variable with changes in the independent variable, we can use the rules described above. But, most interesting phenomena in economics depend on more than one variable. For instance, a manager may find that the quantity his firm is able to sell depends not only on price, but also on income. When a function has more

than one independent variable, the rules for differentiation are the same, but the notation is slightly different. Take, for example, the following function:

y = 5x + 3z This function has one dependent variable (y) and two independent variables (x and z). Since the rules for differentiation listed above apply to equations with two variables only, we cannot apply them directly here. Recall, however, what a derivative measures – the rate of change between two variables. If we can mimic an equation that has only one independent variable by holding the second one constant, we’ll have an equation with two variables and thus we’ll be able to apply the rules of differentiation. So, if we were to hold z constant in the above equation, we could take the derivative of y with respect to x only. Since we are taking the derivative with respect to only one variable at a time, this is called the partial derivative of y with respect to ∂y x, and is denoted . ∂x When finding a partial derivative, we are looking for the rate of change of the dependent variable for small changes in only one of the independent variables. Thus, the other independent variables are treated as constants in this process (they are not changing). Partial derivatives are then found by applying the standard rules for differentiation, treating other variables as other constants are treated. Two simple examples follow: i. y = ax + bz ⇒ ii. y = ax b z c ⇒

∂y ∂y = a and = b ∂z ∂x

∂y ∂y = abx b −1z c and = acx b z c −1 ∂x ∂z

Based on these rules, the partial derivatives of

y = 3x 2 + 4xz + z are

∂y ∂y = 6x + 4z and = 4x + 1 . ∂x ∂z

Chapter 2 Cost, Demand, and Profit Maximization As discussed in Chapter 1, profit is simply the difference between revenue and cost. Revenues are determined on the demand side and costs are determined on the production side. Therefore, before studying profit‐maximizing decisions, we need to know a bit about the way economists model cost and demand. This chapter will touch on those models briefly, and then turn to profit maximization. We will consider models of demand and cost in more detail in later chapters.

Cost, Its Determinants, and Marginal Cost In order to produce products to sell to its customers, a firm must procure inputs. Expenditures for these inputs are costs. A firm’s cost function, C(q), represents the minimum possible total cost of producing q units of output. Thus, in using a cost function to model cost, we are assuming two things. First, we assume there is no simple waste of inputs. In other words, we assume that the firm is able to use all of its facilities and labor exhaustively. Second, we assume the firm chooses the most efficient production technique, or combination of inputs, for producing that level of output. So, a firm has already figured out the optimum amount of its different input types (labor, plants, etc.) and uses them accordingly. How a firm makes this decision about how much of each input to use is based on the prices of each input, what technology is available, how much time is available for production, and myriad other factors. Optimizing inputs will be discussed in greater detail in Chapter 6. In general, the notation for a firm’s cost function is C = C(q;w, r, z) , which says cost depends on the quantity produced, q, the wage rate, w, what a firm pays its labor, the interest rate or rental rate, r, what it pays for investments in plants and equipment (capital), and any other variables that affect cost lumped into the single variable z. Sometimes it is important, or just convenient, to distinguish between fixed costs and variable costs. Fixed costs are those that do not depend on the output of the firm. Fixed costs are inherently a short run concept. Over a short time span, costs such as the lease on office space or the payment on a loan for plant construction, are fixed, regardless of what level of output is chosen. However, with more time, the lease need not be renewed or the plant can be sold or expanded. So, given enough time, there are no completely fixed costs. Variable costs are the costs that increase as the firm increases its output. The cost of producing an additional unit is the firm’s marginal cost. In other words, it is the rate at which total cost changes. Thus, marginal cost is defined as the derivative of total cost, or

MC =

dC dq

(2.1)

Upon taking the derivative of the total cost function, it is clear that the fixed component of cost will fall out since it never changes. Thus, marginal cost may be viewed as either the change in total cost or the change in total variable cost when one more unit is produced. For this reason, marginal cost may seem like the same thing as variable cost, or, perhaps, variable cost per unit of output; this is a misconception. Marginal cost is simply the cost of producing an additional unit of output. It does follow immediately from these definitions that the sum of the marginal cost of each unit that has been produced is equal the total variable cost. The graphs to the right show a total cost curve and a marginal cost curve. These have the “typical” textbook shape. First, cost rises with output at a decreasing rate, meaning marginal cost is falling. This might reflect increasing opportunities for specialization. Then, at some point, some form of diminishing returns sets in and marginal cost starts to rise, meaning total costs increase at an increasing rate. However, cost curves need not always have this “typical” textbook shape. When we actually want to estimate a cost function, or, to specify one for a practice, homework, or exam problem, we have to be more specific mathematically. In practice, we will use four alternative functional forms to approximate cost functions. They are: 1) C (q) = F + cq where c>0 and F≥0 and for which MC = c , 2) C (q) = F + cq d where c>0, d>0, and F≥0 and for which MC = cdq d −1 , 3) C (q) = F + aq + bq 2 where a>0, b>0, and F≥0 and for which MC = a + 2bq , and 4) C (q) = F + aq + bq 2 + cq 3 where a>0, b<0, c>0 and F≥0 and for which MC = a + 2bq + 3cq 2 . The first approximation is perhaps the simplest. It simply assumes total cost is the sum of a fixed component, F, and a variable cost component that is constant per unit produced at c. The second allows for either increasing or decreasing marginal cost, depending on whether d>1 or d<1. If d=1, we just get back the first approximation, which is obviously just a special case of the second. The third approximation gives a linear and increasing marginal cost. Finally, for the right parameter values, the fourth approximation gives the “typical” textbook case where marginal cost first falls then rises. These are all just approximations to be used in models. Which is more appropriate to use depends on the particulars of the

situation under study, and is a matter to be decided using both data and careful judgment.

Demand and Its Determinants, Inverse Demand When a firm sets a price for their product, they are ultimately making a decision about how many consumers will buy their product; this is because some customers value the product enough to pay a high price and others are only willing to pay a low price. The quantity consumers will purchase, then, is a function of the price charged, and is represented

q = q( p) In general, if the quantity is some “total” quantity, such as total output of an industry, it will be denoted as Q; otherwise, the quantity will represent that of an individual firm and will be denoted as q. Many factors besides price influence consumers’ buying decisions. Consider the decision to buy a car. Your annual income will weigh heavily on which car you choose, as will the type of terrain that surrounds your home city, the prices of other cars in the same category as the one you are considering, and countless other factors. For some purposes, for example when we wish to focus on how to choose a profit‐maximizing price or production level, it is convenient to ignore such factors. In other cases, we may want to take explicit note of the impact of such other factors. The notational differences are intuitive: if quantity depends on price, as well as income (m), the prices of substitutes and complements (pS and pC), and the size of the market or the number of consumers (n), and we let z represents other variables that affect demand but aren’t being explicitly measured, then we express quantity is a function of all of these variables:

q = q ( p, m, pS , pC , n, z ) . Later, in Chapter 7, we will explore the theory underlying demand curves in more detail. For now, we focus on understanding and using the simple notion that the quantity demanded can be expressed as a function of these variables. Representing demand in this way implies that quantity depends on price, among other things. Since economics is an application of science to the real world, it is often the case that the variables within a system determine each other, as opposed to being either exclusively dependent or independent. For this reason, it is sometimes sensible (or simply more convenient) to represent the price that’s being charged as a function of the quantity the firm wants to sell. When the relationship is expressed this way, it is called inverse demand. Consistent with our earlier notation for demand, if we want to account for the dependence of inverse demand on several variables, it could be represented as

p = p (q, m, pS , pC , n, z ) . If we want to focus on the relationship between price and quantity alone and suppress the other variables, it is represented as

p = p(q) . When we illustrate demand curves with price on the vertical axis and quantity on p the horizontal, as in the figure to the right, we are actually drawing an inverse demand curve. In the figure, the demand curves follow the law of demand. That is, at higher prices, the quantity demanded is lower. The shift from demand curve d0 to demand d1 curve d1 illustrates the impact of one of the other variables that affect demand. The d0 increase in demand might be due to an increase in income (assuming the good is q normal), an increase in the number of consumers, an increase in the price of a substitute, or a decrease in the price of a complement Any of those changes would cause the quantity demanded to increase at any given price, thus, the whole curve shifts right.

Measuring the Sensitivity of Quantity Demanded to Price We know that if the price of a product changes, it affects the quantity demanded. More specifically, the law of demand tells us that if price falls, quantity demanded rises, and vice versa. But what about the rate at which demand rises and falls? The slope of the demand curve tells us how fast quantity changes with respect to price. If we are given a demand curve q(p), the slope is simply the derivative of quantity with respect to price:

dq( p) < 0. dp

(2.2)

Consider the graph of a demand curve to the right. Every time price falls by one dollar, quantity demanded increases by two. This change in quantity on a per‐dollar basis is defined as the slope of the demand curve. Since price and quantity are inversely related on a demand curve, this value will always be negative. In our example, the slope of the demand curve is ‐2. Since demand is linear, the slope is the same all along the curve. If demand were non‐linear, the slope would change depending on what price were being charged, but it would still always be a negative value. Notice that our calculation of a slope of ‐2 was inconsistent with the definition of the slope of a line that you are probably familiar with. If you take price to be the “y” axis and quantity to be the “x” axis and define slope as rise over run, or, the change

in y divided by the change in x, you would find a slope of ‐½. Referring back to the graph, if we were to look instead at a single unit increase in quantity, we could infer that price falls by 0.50. This change in price per‐unit increase in quantity is the slope of the inverse demand curve, which is what is shown in the figure above. It is in direct accord with our understanding of the slope of the line in the figure above ‐ the slope of the inverse demand curve is ‐½. The slope of the inverse demand curve tells us how fast price changes with respect to quantity. If we are given an inverse demand curve p(q), the slope of the inverse demand is just the derivative of price with respect to quantity:

dp(q ) < 0. dq

(2.3)

Again, the slope of the inverse demand curve can change depending on what price is being charged, but it will always be negative. It is often desirable to compare demand responsiveness across several firms, regions, nations, or even time periods. Yet, the units in which both prices and quantities are quoted vary. For example, prices may be in dollars, cents, yen, or euros, and quantities may be in ounces, pounds, grams, dozens, hundreds, or thousands. Since slope depends on the units in which both price and quantity are quoted, it can be an inconvenient way to summarize the price sensitivity of demand. Elasticity measures demand responsiveness in percentage terms, making it units‐free. Because elasticity is units‐free, it can be easily used to compare demand across multiple firms, industries, locations, or time periods. The elasticity of demand with respect to price, denoted η (eta), is defined as the percentage change in quantity relative to the percentage change in price, or

η=

%Δq Δq / q Δq p . = = %Δp Δp / p Δp q

(2.4)

In equation (2.4), Δq is the change in quantity and Δp is the change in price. Elasticity can be measured over an interval of prices and quantities, or at a single price and quantity. If it is measured over an interval, Δq and Δp are the differences between the endpoints of the interval (although we would then need to decide which point along that interval to use for p and q). If it is measured at a single point, Δq and Δp are assumed to be infinitesimal, so this fraction becomes the rate of change of quantity with respect to price at that point, which is simply the derivative. Elasticity then becomes

η=

dq p . dp q

(2.5)

This is simply the slope of the demand curve times price divided by quantity. Since there is an inverse relationship between price and quantity on a demand curve, the first term in the equation for elasticity will always be negative, and thus elasticity of demand will always be negative. In general, the more negative elasticity

of demand is, the more responsive customers are to changes in price. When elasticity of demand is between 0 and ‐1, demand is considered to be inelastic; when it is ‐1 exactly, demand is said to be unitary elastic; when elasticity is less than ‐1, demand is said to be elastic. Elasticity is generally not constant over a whole demand curve. For a linear demand curve with constant slope, it is obvious from equation (2.5) that elasticity is high at high prices and low at low prices. This is often the case even when demand is not linear. Intuitively, when prices rise, consumers tend to become more price‐ sensitive. At very low prices, consumers tend to be relatively insensitive to price changes.

Demand Approximations Economic theory only says demand slopes down – it says nothing directly about the shape of the demand curve. For applications (and for writing homework and exam problems) we need to specify more about the demand relationship. This often involves simply assuming that a particular shape is a good enough approximation of the shape of the true underlying demand curve and then choosing the parameters of the approximation to fit the actual demand curve as closely as possible. In practice, one of two assumptions is almost always made about the shape of the demand curve – either slope is assumed to be constant or else elasticity is assumed to be constant. More precisely, it is assumed that either slope or elasticity is relatively constant over the range of prices under consideration.

Linear Demand Approximations If we assume that the slope of the demand curve is constant, we are using a linear approximation to model demand. The name linear demand comes from the assumption that each variable that is being measured (price, income, etc.) affects demand at a constant (linear) rate, regardless of how big or small the variable is. For example, if an increase in per capita income of $5,000 leads to an increase in quantity demanded of 30 units at a given level of price and income, the same is true at any other level of price or income. In a linear demand approximation, the coefficient of each variable represents the change in quantity demanded per unit change in the variable. A generic linear demand representation looks like the following:

qD = b0 + bp p + bM M + bS pS + bC pC + bN N + bZ Z + ε .

(2.6)

The coefficients, or parameters, are chosen to fit the observed data on demand as closely as possible. We will cover that in the next chapter. For now, we simply want to introduce the idea of using a straight line to approximate demand and focus on how to use such a model once we have it. In equation (2.6) p is price, M is income, pS and pC are the prices of substitutes and complements, N is the number of customers (or size of the market), Z represents any other factors that are important in a particular situation, and ε represents a random error term. The error term encompasses all the factors that

can not be readily understood and measured. Note that bP will always be negative because price and quantity are inversely related. Recall that for a specific point on the demand curve, elasticity is

η=

dq p dp q

thus, elasticity is

η = bp

p . q

(2.7)

This elasticity is high at high prices and approaches 0 as price decreases, as described previously. When focusing on setting price or choosing production levels, it is simplest to represent quantity demand as a function of price only. To do so we will lump all of the other variables into a single coefficient A, and let B, which is positive, be the absolute value of the slope. This single‐variable linear demand function is then

qD = A − Bp .

(2.8)

Note that the slope of the demand curve is still negative because –B is negative. It is important to understand that by simplifying the demand approximation to a function of a single variable, all of the other effects (income, price of substitutes, etc.) are being represented in the intercept A. It is also often more convenient to rearrange the demand curve and deal with inverse demand. Doing so gives

q = A − Bp q − A = − Bp 1 1 −B p = ( q − A) . −B −B A 1 p= − q B B

(

)

So, inverse demand has a positive intercept and a negative slope. It is easier to write this as

p = a − bq

where a=A/B and b=1/B.

(2.9)

Example: Linear Demand

Suppose demand is approximated by q = 10000 − 250 p . What is the interpretation of the slope, what is inverse demand, and what are quantity and elasticity if price is 10? Solution: The slope of ‐250 means if price increases by 1 unit, quantity falls by 250 units. To get inverse demand, just solve for p in terms of q.

q = 10000 − 250 p 250 p = 10000 − q p=

10000 1 − q 250 250

p = 40 − 0.004q To find quantity and elasticity if price is 10, plug 10 into the demand function to find q. Then plug slope, price, and quantity into the definition of elasticity.

q = 10000 − 250(10) = 7500

η = bp

p 10 1 = −250 =− q 7500 3

Log linear (constant elasticity) Demand Approximations Another approach when approximating demand is to assume the elasticity of demand is constant. This is called a log‐linear demand model – the reason why will become apparent later. By assuming constant elasticity, log linear demand is making a presumption about how consumers’ buying habits change as demand determinants change, on a percentage basis. For instance, if a firm assumed that an increase in per capita income of 10% would result in a 5% increase in quantity demanded, regardless of the current level of income or other demand determinants, we would be assuming constant income elasticity of demand 0.5. A generic constant elasticity demand approximation looks like the following,

qD = eb0 +ε p p M bM pS bS pC bC N bN Z bZ , b

(2.10)

where the exponents on the demand determinants turn out to be the constant demand elasticities – again the reason why will become apparent soon. Just as in the linear demand model above, ε is a random error term and bp is negative. If we were to take the logarithm of both sides of this approximation and apply the laws of logarithms, we would obtain

ln ( qD ) = b0 + bp ln ( p ) + bS ln ( pS ) + bC ln ( pC ) +bM ln ( M ) + bN ln ( N ) + bZ ln ( Z ) + ε

(2.11)

Note that this approximation is linear in the natural logs of the variables. That is, treating the logs of the original variables as the dependent and independent variables, we have a linear equation, thus the name log‐linear demand model. If we wanted a simplified log linear approximation with price as the only variable, it would look like

q = Ap − B

(2.12)

ln ( q ) = ln ( A ) − B ln ( p ) ,

(2.13)

where again A is just standing in for all the other variables and where we are again letting B represent the absolute value of the coefficient on price, which is an exponent in the demand curve in this case. In this simplified form, we can easily look at both the slope and the elasticity of the demand approximation. The slope is the derivative of quantity with respect to price,

dq = − BAp − B −1 . dp This can also be written as dq Ap − B = −B . dp p

Finally, noting that q = Ap − B , this is just

dq q = − B . dp p

(2.14)

With that, it is easy to find elasticity using its definition from equation (2.5)

η=

dq p dp q

= −B

q p p q

= −B Thus, with a log‐linear demand approximation, the elasticities of demand with respect to any of the independent variables are constant and equal to the coefficient on that variable.

Example: Log Linear Demand

Suppose demand is approximated by q = 20000 p −2 . What is the elasticity of demand, what quantity is demanded at a price of $20, what is the slope at that price, and what is inverse demand? Solution: The elasticity is given by the exponent, ‐2. To find quantity demanded at a price of $20, just plug 20 into the demand curve.

q = 20000(20) −2 20000 = 400 = 50 To find the slope at that point, take the derivative of demand and plug in the values for the given point.

dq = (−2)20000 p −3 dp −40000 = 203 = −5 Finally, to get inverse demand, rearrange the demand curve to express price in terms of quantity. 2 p2 20000 p q = First, multiply both sides by p2 and divide both by q. q q p2 1

⎞2 ( p ) = ⎛⎜ 20000 ⎟ ⎝ q ⎠ Second, raise both sides to the ½ power to isolate price. p ≈ 141.42q −0.5 1 2 2

Revenue and Marginal Revenue A firm’s total revenue is simply the price it charges times the quantity it sells. Revenue = price×quantity , or

R = pq .

(2.15)

Of course, the firm is not free to choose both price and quantity. If the firm is so small that its actions can have no noticeable effect on the market price, it is said to be a price taker, or perfectly competitive. In that case, the firm may choose only how much to sell at the going market price.

If the firm is large relative to its product market, its actions will have non‐ negligible effects on the market price. In that case, the firm is said to be a price maker, or to possess a degree of market power. Still, the firm is not free to choose both price and quantity because it is constrained by consumer demand. If the firm wants to sell a lot, it must set a lower price. If it wants to charge a high price, it must resign itself to lower sales. So, the firm can choose price or quantity, but not both. When demand slopes down, if the firm sells so many units as to drive the price all the way to 0, revenue is 0. Similarly, revenue is 0 if the firm sells 0 units. In between, revenue is positive, reaching a maximum possible value at some intermediate quantity. In general, selling additional output has two offsetting effects on revenue. First, the additional sale brings in new revenue. Second, in order to sell the additional unit, price must be set lower. This reduces revenue. The graph to the right illustrates this important point. Suppose we begin at a price of p1 and sell a quantity of q1. In order to sell more units, we lower our price to p2, at which price we sell q2 units. What are the effects on revenue? For each unit we initially sell, we receive less money per unit, due to the decrease in price. This loss in revenue is represented by the rectangle labeled “LOSS”. However, our lower price causes more people to want to buy our product, so we sell more units. The increase in unit sales adds to our revenue and this increase is represented by the rectangle labeled “GAIN”. If the gains outweigh the losses, selling the additional units adds to the firm’s total revenue. If the losses outweigh the gains, the change in revenue from selling the additional units is negative. This change in total revenue from selling a single additional unit is called marginal revenue. In other words, it’s the rate at which total revenue changes when more output is sold. Starting from a high price and low quantity, marginal revenue is positive – the direct increase in revenue from the extra unit sold outweighs the decrease in price. At high quantities and low prices, marginal revenue is negative – the direct increase in revenue from the additional unit is overwhelmed by the decrease in price.

The top panel of the figure to the right depicts the relationship between revenue and quantity when demand slopes down, R(q). The bottom panel shows marginal revenue, MR(q). When quantity is 0, so is revenue. When quantity is so high as to drive price to 0, revenue is again 0. At low quantities, revenue is rising with additional sales, so marginal revenue is positive. At high quantities, revenue falls with additional sales and marginal revenue is negative. At the maximum revenue, marginal revenue exactly equals 0 – the direct gain from selling an additional unit is exactly offset by the decline in price needed to generate a higher sales total. To model revenue mathematically, starting from an inverse demand curve, p(q), total revenue would be price times quantity

R = p(q)q .

(2.16)

Marginal revenue is the derivative of total revenue with respect to quantity. Using the chain rule, this is

MR =

dR dp = p + q . dq dq

(2.17)

It is important to really understand how equation (2.17) tells in part of one line the whole story about marginal revenue explained in three paragraphs of words above. When another unit is sold, the direct effect is to bring in additional revenue equal to the price charged for that unit, p, the first term in the derivative. The indirect effect is to lower price by the sensitivity of price to quantity sold, dp/dq. That decrease in price is applied to all units that are to be sold, q. So, we sell an additional unit at price p but receive a lower price on all q units sold. It may initially seem that marginal revenue is always simply the price that the firm sets for its product. However, looking at the above equation, it is clear that this is only true if dp/dq is 0 ‐ that is if the firm’s sales have no effect on price. That means the firm is a price taker and the inverse demand curve faced by the firm is a horizontal line at whatever the going market price is. Markets where all firms are price takers are called perfectly competitive. Most firms have some control over the price they charge for their products though, so dp/dq will be negative and marginal revenue will be less than price. However, in cases where the individual firms have only a little control over price, it is often more convenient, and realistic enough, to just treat them as price takers and ignore their tiny impact on market prices.

Example: Revenue and Marginal Revenue

Suppose inverse demand is approximated by p = 7 − 0.3q . What are the equations for revenue and marginal revenue, what quantity and price would maximize revenue, and what is the maximum value of revenue? Solution: Revenue is price times quantity, so R = ( 7 − 0.3q ) q .

Marginal revenue is the derivative of revenue with respect to q. Using the product rule,

MR = 7 − 0.3q − 0.3q = 7 − 0.6q . (In general, with linear demand, marginal revenue has the same intercept and double the slope of the inverse demand curve. You should prove that to yourself using the general form of an inverse demand curve, p = a − bq .) Revenue is maximized where marginal revenue is 0, so

MR = 0 7 − 0.6q = 0 0.6q = 7 70 q= 6 p = 7 − 0.3q

3 70 =7− 10 6 42 21 21 = − = 6 6 6 21 70 R= ≈ 40.83 6 6

Marginal Revenue and Elasticity Recall that, from the definition of marginal revenue,

dR dp = p + q . dq dq If we multiply the second term by (p/p) and rearrange, we obtain

dp ⎛ p ⎞ dR = p + q⎜ ⎟ dq ⎝ p ⎠ dq dR dp ⎛ q ⎞ = p + ⎜ ⎟ p . dq dq ⎝ p ⎠ Observing that (dp/dq)(q/p) is the inverse of elasticity, we can write

dR 1 = p+ p η dq

⎛ 1⎞ dR = p ⎜1 + ⎟ . dq ⎝ η⎠

(2.18)

This equation describes the relationship between marginal revenue and elasticity. If the elasticity of demand exceeds 1 in absolute value so that demand is elastic, marginal revenue is positive. That is because the increase in quantity sold will swamp the decline in price. As elasticity increases in absolute value, 1/η approaches zero, so marginal revenue approaches price. As elasticity becomes very large, the marginal revenue from the next unit sold will be almost the same as the current price. Notice that if demand is inelastic (elasticity is between 0 and ‐1), marginal revenue will be negative. This occurs because lowering price when demand is inelastic will cut revenue. Since more units sold means higher cost, it will also cut profit. In other words, if you truly believe demand is inelastic, you ought to raise price.

Example: Marginal Revenue and Elasticity

Suppose elasticity is ‐0.25. What is marginal revenue? Solution: From equation (2.18)

⎛ 1 ⎞ MR = p ⎜ 1 − ⎟ ⎝ 1 4⎠ = p (1 − 4 ) . = −3 p Since elasticity is less than 1 in absolute value, MR is negative! Price should be increased.

Profit Maximization We have already established that it is a manager’s ultimate duty to maximize the present value of the profits created by the firm. If a manager sets a price that is too low, the firm may sell a lot of units, but profit per unit will be so low that total profits are small. On the other hand, if a manager sets a price that is too high, quantity demanded will decrease too sharply, leading potentially to high profit per unit but again low total profit. How do we model choosing a price and production level to maximize profit? The figure at right shows a generic profit function, π = R(q ) − C (q) . Just as in the above discussion, it is clear that selling too many or too few units will result in a loss of potential profits. Using calculus, we can maximize this profit function by setting its derivative equal to zero:

dp ⎞ dC dπ ⎛ = ⎜ p + q⎟ − = 0 . dq ⎝ dq ⎠ dq Based on the definitions of marginal revenue and marginal cost, this profit‐maximizing condition can be written as MR − MC = 0

MR = MC

(2.19)

So, when the revenue a firm gets from selling its last unit equals how much it costs to make that last unit, a firm is maximizing profit. If marginal revenue is greater (less) than marginal cost, the change in profit from selling another unit will be

positive (negative) – so the firm should sell more (less). The graph to the right shows this profit‐ maximizing condition in general. In order to sell the profit‐maximizing quantity, q*, where marginal revenue equals marginal cost, the firm sets a price of p*. This is the basic idea at the core of the class. If doing something increases profit, do it more. If it lowers profit, do it less. Profit can only be maximized where the marginal benefits and marginal costs of any action balance one another exactly. Understanding what this means for decisions in various scenarios with information of varying completeness and complexity and in situations where many decision makers are simultaneously trying to predict what everyone else will do so as to strategically maximize their own profits will fill this course. Indeed, broadly understood, that basic pursuit fills all courses that fall under the large heading of microeconomics. So, the trick to mastering the course is to really understand what is going on here and then hone your critical thinking skills and analytical abilities so that you can apply this basic insight in many more complex situations. What if we had maximized marginal revenue instead? Then we would set marginal revenue equal to 0. Price would be lower and quantity would be higher. Importantly, profits would be lower. That is because the last units sold would have marginal revenue near 0 but a positive marginal cost. If the bottom line is profit, both cost and revenue must be factored in. Beware the recommendations of a sales or marketing department who are paid commissions based on revenue, not profit. It may be in their interest to recommend prices too low and sales too high for the good of the company!

Example: Profit Maximization

Suppose a firm faces an inverse demand curve of p = 7 − 0.3q and has a cost function of C(q) = 9 + 1.1q . Find the profit‐maximizing price and quantity, and calculate the firm’s profit. Solution: Set up the firm’s profit function and then maximize by setting the derivative equal to 0.

π = pq − C(q) π = ( 7 − 0.3q ) q − ( 9 + 1.1q ) (Substituting for p from inverse demand) dπ = 7 − 0.6q − 1.1 = 0 dq 0.6q = 5.9

q = 5.9 0.6 p = 7 − 0.3 ( 5.9 0.6 ) = 4.05

π = 4.05 ( 5.9 0.6 ) − ( 9 + 1.1( 5.9 0.6 ) ) ≈ 20 Compared to the solution in the previous example where revenue was maximized, price is higher, sales are lower, revenue is lower, and profit is higher. Alternatively, if given a demand function instead of an inverse demand function, you could choose the price to maximize profit. To see that, first rearrange the given inverse demand function to get the corresponding demand function.

p = 7 − 0.3q 0.3q = 7 − p 70 10 q= − p 3 3 Now set up the profit function, take the derivative, and solve.

π = pq − C(q) ⎛ 70 10 ⎞

⎛ 70 10 ⎞

π = p ⎜ − p ⎟ − 9 − 1.1⎜ − p ⎟ ⎝ 3 3 ⎠ ⎝ 3 3 ⎠ ⎛ 70 10 = ( p − 1.1) ⎜ − ⎝ 3 3

⎞ p⎟−9 ⎠

(Substituting for q from demand)

dπ 70 10 10 11 = − p− p+ =0 dq 3 3 3 3 20 81 p= 3 3 p = 4.05

70 10 − (4.05) ≈ 9.83 3 3

π = 4.05 ( 9.83) − ( 9 + 1.1( 9.83) ) ≈ 20 Whether you should work from the demand curve or the inverse demand curve just depends on convenience. Most of the time, it will be easier to work from inverse demand in problems like this, but not always. Estimates of demand elasticity can also be used to maximize profit. Recall the equation relating marginal revenue to marginal cost:

⎛ 1⎞ MR = p ⎜ 1 + ⎟ . ⎝ η⎠ This relationship between elasticity and marginal revenue always holds since it was derived from a generic revenue formula. But how does it relate to marginal cost and overall profit? Now we know that if profit is being maximized, marginal revenue equals marginal cost. So, if a firm is maximizing profit, we can rewrite this relationship as

⎛ 1⎞ MR = p ⎜ 1 + ⎟ = MC ⎝ η⎠ and, solving for price, we obtain

⎛ η ⎞ p=⎜ ⎟ MC ⎝ 1 +η ⎠ In this form, the coefficient on marginal cost,

(2.20)

η , can be thought of as a “mark‐ 1 +η

up” factor. This factor tells us how much to mark‐up price above marginal cost to maximize profit based on our customer’s sensitivity to price. This mark‐up factor increases as elasticity decreases in absolute value, thus profit‐maximizing prices are higher where consumers aren’t as price sensitive. While this relationship always holds, it is particularly useful in situations where both marginal revenue and elasticity are roughly constant in the face of small changes in quantity. In that case, equation (2.20) is not simply a condition that must

be true if profits are maximized, but is a simple formula for estimating the profit‐ maximizing price, as long as it is not too far from the current price. Example: Profit Maximization and Elasticity

Suppose elasticity is ‐5, marginal cost is 3, and price is currently 5.50. Without assuming elasticity and marginal cost are constant, is price too high, too low, or just right? If we assume elasticity and marginal cost are relatively constant,

⎛ 1⎞ Solution: If elasticity is ‐5 and price is 5.5, marginal revenue is 5.5 ⎜1 − ⎟ = 4.4 ⎝ 5⎠ from equation (2.18). Since this is higher than marginal cost, more units should be sold, meaning the current price is too high. To maximize profit, the mark up over marginal cost should be ‐5/(1‐5), or 1.25. So, price should be (1.25)(3)=3.75. Example: Profit Maximization using Mark‐up

Suppose a firm faces a demand curve of q = 10,000 p −3 and unit cost is constant at $4 per unit. Find the profit‐maximizing price and quantity, and calculate the firm’s profit. Solution: This problem could be solved like the earlier example with linear demand, but given that we have a demand curve with constant elasticity, as well as a constant marginal cost, we can use Equation (2.20) to find the price. Observing that the elasticity is ‐3 and the marginal cost is 4, price is

⎛ η ⎞ p=⎜ MC ⎝ 1 + η ⎟⎠ ⎛ −3 ⎞ 4 p=⎜ ⎝ 1 − 3 ⎟⎠ p = (1.5 )4 = 6 and quantity is

q = 10,000 (6 ) ≈ 46.3 . −3

The firm’s profit is

π = (6 − 4)(46.3) = 92.6

Chapter 2 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Fixed Cost A cost that remains constant (in a relative range of time) regardless of a change in the number of units produced. In the short run, this type of cost cannot be avoided whereas in the long run, it can. Law of Demand Law that states that as price increases, quantity demanded decreases. This explains why demand curves are downward sloping and the price coefficient is always negative. Linear Demand An additive representation of demand that holds the slope constant. It assumes each variable that affects demand does so linearly, as opposed to having a squared or cubed relationship. The coefficient of each variable represents the way in which the variable affects demand. LogLinear Demand A multiplicative representation of demand in which each variable is raised to a certain power. This model assumes that elasticity is constant and is the power to which the price variable is raised. Marginal Cost The cost incurred when a firm sells one additional unit of a good. It is the rate of change (or derivative with respect to quantity) of total cost. It is increasing because of the presence of a fixed factor. In order to maximize profit, this should be set equal to marginal revenue. Marginal Revenue The additional income of selling one more unit. It is the rate of change (or derivative with respect to quantity) of total revenue. In order to maximize profit, this should be set equal to marginal cost. Markup Factor A factor that shows by how much more a firm can charge consumers for a product than it costs to make. As consumers become more price sensitive, the markup factor approaches 1, that is, price approaches marginal cost. The less price sensitive consumers are, the higher the markup factor. The markup factor is multiplied by the marginal cost in order to find a firm’s profit maximizing price. Market Power A firm’s ability to control and change the market price of a good. Whereas a perfectly competitive firm (price taker) has the least market power, a monopolistic firm (price maker) has the most. Price Elasticity of Demand The percentage change of quantity sold with respect to a percentage change in price. It is a tool that tells us how sensitive customers are to changes in price. Elasticity of demand will always be a negative number and the more elastic demand is, the bigger the absolute value of elasticity will be. Price Taker A firm that has no control over the price of a good (e.g. a wheat farmer) who takes and uses the global price of that good. This type of firm has no market power, i.e. a perfectly competitive firm.

Profit A firm’s financial gain; the difference between total revenue and total costs. This is the amount that rational firms care most about and will always want to maximize when making decisions. Revenue A firm’s income; the total amount received by consumers or the price per unit of a good multiplied by the number units sold. Total Cost The total amount it costs a firm to produce a unit of a good multiplied by the number of units sold. This is the amount to be subtracted from revenue in order to find a profit or a loss.

Chapter 3 Applications and Extensions of Optimal Production and Pricing In this chapter, we consider a number of related applications and extensions of the basic principles set out in Chapter 2. The specific extensions considered are: price discrimination, setting price across locations that differ in market size but are otherwise similar, the implications of a capacity limit, the optimal choice of capacity when demand varies over time, maximizing profit when there is uncertainty about a determinant of profit, and the value of additional information when faced with uncertainty. Rather than viewing them as separate topics, you should look at them as small additions or adjustments to the basic ideas introduced in the last chapter, or as small variations in the basic tools you have already learned to use. The primary goal of this course is for you to learn to use the tools of microeconomics to understand, analyze, and evaluate business decisions. Since the number of possible situations is unlimited, there is no hope of simply learning by rote what economic analysis concludes about every possible situation. Rather, you must practice generalizing and adapting the basic tools to ever new and changing situations.

Simple (3rd degree) Price Discrimination Maximizing profit with a single product and a single customer type is easy enough. But sometimes, firms are able to extract additional profit by charging different types of customers different prices. This is known as simple (or 3rd degree) price discrimination. A common example of simple price discrimination is when a movie theater offers discounts to senior citizens. By doing this, it is charging two different groups of customers (senior citizens and non‐senior citizens) two different prices for the same product. By charging each customer group a price based on that group’s demand elasticity, the theater is able to increase its overall profit. For a firm to effectively use simple price discrimination, its customers must fall into distinct groups that differ in their willingness to pay. These customer groups must also be easily identifiable, such as by IDs. Also, because the firm is charging two different prices for the same product, resale must not be possible. Otherwise, the customer that gets the product for the cheaper price could make a profit by reselling to the other customer at a slightly lower price than the firm charges. Assuming both of these conditions are present, a firm is able to maximize their profit by charging two different prices. Letting subscripts 1 and 2 denote two different consumer groups, the firm’s profit is

π = p1q1 + p2 q2 − C (q1 + q2 ) .

(3.1)

Cost depends on the total amount produced, which is q = q1 + q2 . Since the cost of producing a unit is the same whether or not a type 1 customer or a type 2 customer ends up purchasing it, it follows that a firm will maximize profit when

MR1 = MR2 = MC .

That is, marginal revenue for both customer types must equal the common marginal cost of production. This makes sense. If marginal revenue exceeds (falls short of) marginal cost in either market, more (fewer) units should be sold in that market. Which market is charged the higher price? The one with the less elastic (less price‐sensitive) demand. From the rule established above for the relationship between the profit maximizing mark‐up and elasticity, saying that marginal revenue must equal the common marginal cost in both markets means:

p1 =

η1 η MC and p2 = 2 MC . 1 + η1 1 + η2

Thus, if the absolute value of elasticity is lower in market 1, the price is higher in that market. That is why senior citizens and students get discounts – they tend to have less disposable income and/or more time to shop, thus they are more sensitive to price. Example: 3rd Degree Price Discrimination – Linear Demand

Suppose there are two types of customers with inverse demand curves p1 = 20 − q1 and p2 = 30 − q2 . If the firm’s cost is C(Q) = 0.5(q1 + q2 )2 , find how much more profit the firm can make by implementing 3rd degree price discrimination. Solution: We need to find the firm’s profit when they use discrimination and when they charge a single price, and then find the difference. The firm’s profit when charging two separate prices is

π = (20 − q1 )q1 + (30 − q2 )q2 − 0.5(q1 + q2 )2

and in order to maximize profit, MR1 = MR2 = MC .

MR1 = MC

15 = 4q1

20 − 2q1 = q1 + q2

q1 = 3.75

MR2 = MC

q2 = 5 + q1 = 8.75

30 − 2q2 = q1 + q2

p1 = 20 − 3.75 = 16.25

20 − 2q1 = 30 − 2q2

p2 = 30 − 8.75 = 21.25

2q2 = 10 + 2q1

q2 = 5 + q1

20 − 2q1 = q1 + 5 + q1

π = 16.25(3.75) + 21.25(8.75) −0.5(3.75 + 8.75)2

π = 168.75

Now suppose that the firm can only charge a single price. We need to find the firm’s total demand by combining the individual consumer types’ demand curves. Since the total amount the firm sells is the sum of the amount they sell to

type 1 and type 2, we need to solve the original inverse demand curves for quantity:

p1 = 20 − q1 → q1 = 20 − p1

p2 = 30 − q2 → q2 = 30 − p2

To find the firm’s total demand curve with one price ( p1 = p2 = p ), add both of these quantities:

q = 50 − 2 p

Now, rearranging to find inverse demand

p = 25 − 0.5q

we can solve for the quantity and price that maximize profit.

π = (25 − 0.5q)q − 0.5q2

MR = MC

25 − q = q

q = 12.5

p = 25 − 0.5(12.5) = 18.75

π = 18.75(12.5) − 0.5(12.5)2 = 156.25

So, by price discriminating, the firm is able to make an additional 168.75 − 156.25 = 12.50 in profit.

Profit maximization when purchases per capita do not depend on market size It is often reasonable to assume purchases per capita do not depend on the size of the market. That is, in two markets that are similar in terms of income, demographics, and anything else that affects demand other than market size, purchases will be proportional to population at a given price unless there is some specific reason that preferences for the product are related to population. For example, the demand for public transportation may be greater in larger cities due to higher congestion. But, for something like movie tickets or hamburgers, it is hard to see why population size, in itself, should affect the demand of an individual consumer. One natural application of this model occurs when goods are highly durable or infrequently purchased for some other reason, in which case each consumer purchases either 0 or 1 unit in any given time period. For example, few individuals buy more than one house or car in any given year, or more than one concert ticket for any given tour, more than one football ticket for a given game, or more than one movie ticket for a given showing. In that case, purchases per capita can be thought of as (very nearly) the fraction of consumers that purchase the good.

In this case, demand can be written as

qD = f ( p, pS , pC , M, z) N where N is the city size and f ( p, pS , pC , M , z ) is purchases per capita, or the fraction of the population that buys the good for goods that are durable or otherwise infrequently purchased. This model is useful because if you know a lot about demand in Gainesville and you want to estimate demand in Jacksonville, you can draw conclusions as long as you assume that everything is similar enough between the locations, except city size. How does city size affect demand elasticity, and thus the profit‐maximizing mark up, if it does not affect demand per capita? It doesn’t. To see why, first note we can write demand as

qD = Nf ( p) .

(3.2)

Thus, using the definition of elasticity and equation (3.2),

η=

dq p df p =N dp q dp Nf ( p)

dq p dp q df p . = N dp N f ( p ) df p = dp f ( p)

η=

(3.3)

So, if a firm’s marginal cost is roughly constant, this model shows the profit‐ maximizing price doesn’t vary with city size, only with other demand shifters.

Example: Demand for Infrequently Purchased Goods

Of 1,000 potential customers, the fraction purchasing is approximated by f ( p) = 1 − 0.1 p , and constant unit cost is $2. Find the price and quantity that maximizes profits. Solution: The quantity of tickets sold is the number of people times the fraction that will buy them

q = 1000(1 − 0.1 p )

and profit is then

π = pq − 2q

π = ( p − 2)q

π = ( p − 2)i1000(1 − 0.1 p )

dπ = 1000 ( (1 − 0.1 p) − 0.1( p − 2) ) = 0 dp

1 − 0.1 p − 0.1 p + 0.2 = 0

0.2 p = 1.2

p = 6 , q = 1000(1 − 0.1(6)) = 400 , and π = (6 − 2)(400) = 1600

Maximizing Profit with a predetermined capacity constraint Something that is common among many examples of infrequently purchased goods – movie theaters, sporting events, etc. – is the presence of venues, which have a limited amount of seating. In the previous section, we faced a demand function that was proportional to population size and maximized profit, but what if the profit‐maximizing price sold too many seats and the venue ran out of space? In practice, firms often have to deal with capacity constraints. If our profit‐ maximizing quantity is less than our venue’s capacity, the constraint doesn’t affect our decision. If, however, our optimum price sells more seats than we have available, the next best thing we can do is raise price until the excess tickets are no longer demanded. In this way, our price will be set where the quantity demanded is exactly equal to our venue’s capacity (we will sell out). This situation is depicted in the figure below. The figure shows inverse demand per capita (or the fraction purchasing). Capacity is limited to q units, so the fraction purchasing cannot exceed q N . If the profit maximizing price is p* ignoring capacity limits, the fraction purchasing would be too high. So, price must rise to p** to equate demand and capacity.

p p**

Per Capita Inverse

f ( p **) = q / N

f ( p*) > q / N

f(p)

Example: Infrequently Purchased Goods with a Capacity Constraint

Of 1,000 potential customers, the fraction purchasing is approximated by f ( p) = 1 − .1p , and constant unit cost is $2. Find the price and quantity that maximizes profits. Assume a capacity constraint of q = 200 . Solution: As this is the same information in the previous question, we know the profit‐maximizing quantity is 400 tickets. Since our capacity constraint won’t allow us to sell this many tickets, we need to find the price that will sell 200 tickets and thus sell out our theater.

q = 1000(1 − .1p) = 200

1 − .1p = .2

p = 8

π = (8 − 2)(200) = 1200

PeakLoad Pricing Determining capacity when demand varies over time In the last few applications, we showed that ticket sales of 400 would have given the firm $1600 in profit, but when their capacity limited their sales to 200, their profit was reduced to $1200. If the firm had been able to choose how large they wanted their venue to be, 400 seats would be optimal, since 400 tickets would maximize their profit. But, this number was only based on a single demand function. If the profit‐maximizing quantity was 400 at some points of the year but only 100 at other points, it may make sense to build a venue that has a capacity of 200 seats.

In situations where demand differs by time of day (or season), it is natural to suggest a firm ought to charge a higher price during the time of day when customers have a higher willingness to pay; this is the notion of peak‐load pricing. For example, suppose a restaurant experiences higher demand during dinner than it does during lunch. To maximize profit, it should charge dinner customers a higher price than it charges lunch customers. Since the assumption for this example is that demand will be higher at dinner, the quantity it sells during dinner will be greater than or equal to the quantity it sells during lunch. To proceed, we need to model both operating cost and capacity cost. Assume the marginal cost of selling one more unit is constant at c, the operating cost per unit. In our example, that would be the cost of an additional meal: setting one more place, preparing one more meal, bussing one more table, etc. If there is plenty of capacity already at hand, these are the only costs incurred if one more unit is produced. If, however, there is not extra capacity available, there is an additional capacity cost. We assume the marginal cost of an additional unit of capacity is constant and equal to k, the capacity cost per unit. In the restaurant example, think of the capacity cost per unit as the opportunity cost of seating one more customer during peak hours: the cost of getting additional floor space, another table, another chair, perhaps a larger kitchen so more meals can be simultaneously prepared, etc. Assuming the quantity sold during peak hours is greater than during off‐peak hours, demand during peak hours determines our capacity. That means that, in our restaurant example, we will only need to worry about capacity costs at dinner, while at lunch, sections of the restaurant will be roped off and not used. So, if we decide to sell another unit during peak hours, we must increase the capacity of our restaurant. The marginal cost of selling one more meal during peak times is c + k . Therefore, to maximize profit with high‐demand times and low‐demand times, a firm should set

MRL = MCL = c

(3.4)

MRH = MCH = c + k

(3.5)

and

where the H and L subscripts refer to high and low demand respectively. The figure to the right shows inverse demand and marginal revenue for both high (H) and low (L) demand times, along with peak and off‐peak marginal cost. During low demand, the optimal quantity sets marginal revenue equal to marginal operating cost, and the optimal price is chosen accordingly. During high demand times, marginal revenue is equated to the sum of marginal operating and capacity costs to find the optimal quantity, and price is chosen accordingly. In the restaurant

example, this corresponds to offering early bird and lunch specials and charging full prices in the evenings. In the situation depicted in the figure above, the quantity of meals sold in the high‐demand period (qH) is higher than the quantity of meals sold in the low‐ demand period (qL), as we assumed it would be. If the willingness to pay in the high demand period is not much greater than the willingness to pay in the low demand period, what may happen is the quantity of meals sold at low‐demand times (lunch) may be higher than the quantity of meals sold at high‐demand times (dinner) if we apply the technique described above to determine prices. To illustrate, consider the figure to the right. Remember, we assumed capacity is determined only by the high‐demand period of the day. In the situation above, we find that qH is actually less than qL. But, since we’ve built capacity based on high‐ demand, we won’t have enough capacity to serve the low‐demand. This is known as a “shifting peak” and is even more likely when the off‐peak price affects peak demand and vice versa – that is when peak and off‐peak consumption are, to some degree, substitutes. Thus, whenever working these problems, it’s important to check that this assumption holds; namely, that qH > qL. At most, the number of units sold at either time of day can equal capacity. If your solution violates that working assumption, it’s back to the drawing board. We would never actually sell less when demand was higher. At most, we would use all of our capacity at both times of day. So, if the solution calls for the low demand quantity to be highest, we must go back and impose on the problem a constraint that qH = qL. So, when solving a peak‐load problem, maximize the firm’s profit, which is

π = pH (qH )qH + pL (qL )qL − cqL − (c + k )qH .

(3.6)

If, upon solving, you find qH ≥ qL , your answer is correct. If, however, qH < qL , you must rework the problem assuming that qH = qL . Since quantities are the same at peak and off peak demand, we drop the subscripts and denote each by simply q. It is important to realize that total output, the sum of peak and off peak sales, will then equal 2q. While operating cost is incurred for all units produced at each time of day, capacity cost is incurred only once for both periods. In this case, profit would become

π = pH (q)q + pL (q)q − (2c + k )q .

Maximizing, we obtain

dπ = MRH + MRL − (2c + k)q dq

(3.7)

MRH + MRL = 2c + k .

(3.8)

This says the combined marginal revenue of the last unit sold at peak demand and the last unit sold at off‐peak demand must equal the marginal operating cost of producing each unit plus the marginal cost of adding the last unit of capacity that allows each unit to be produced. Example: Peak Load Pricing

Suppose a restaurant faces inverse demand of pH = 14 − 0.5qH at high demand times and pL = 12 − 0.5qL at low demand times. If operating costs are $2 per unit and capacity costs are $4 per unit, find the profit‐maximizing prices for both times. Solution: First, treat the problem as if the constraint qH ≥ qL is not violated. So, profit is

π = (14 − 0.5qH )qH + (12 − 0.5qL )qL − 2qL − 2qH − 4qH

∂π = 14 − qH − 2 − 4 = 0 ∂qH

qH = 8

∂π = 12 − qL − 2 = 0 ∂qL

qL = 10

Since the constraint is violated, we have broken our assumption that we sell more during peak demand than we do during off‐peak demand. So, we must sell the same quantity during both demand times. Assuming qH = qL , profit becomes

π = (14 − 0.5q)q + (12 − 0.5q)q − 2q − 2q − 4q

dπ = 14 − q + 12 − q − 2 − 2 − 4 = 0 dq

q=9

pH = 14 − 0.5(9) = 9.5

pL = 12 − 0.5(9) = 7.5

Profit maximization with uncertainty In Chapter 1, we saw how the presence of uncertainty creates a situation where decisions must be based on probability estimates, reducing expected profit. When there is uncertainty about demand or cost conditions, managers are similarly limited in their ability to make optimal decisions. On the demand side, for example, the demand for trucks and SUVs three years from now may depend on the future cost of gasoline. Similarly, uncertainty about future fuel costs may mean there is important uncertainty about operating costs in industries where fuel is a high share of costs. We are interested in situations in which the presence of uncertainty affects the choice of production levels and pricing. Therefore, we are interested in situations in which the production decision must be made before significant uncertainty about the state of demand or marginal cost is resolved. For example, the profitability of a decision to drop some production lines devoted to SUVs in favor of adding lines to produce more fuel efficient hybrid vehicles ultimately depends on the unknown future demand for SUVs and hybrids. In this section, we analyze decisions about profit maximization in the face of such uncertainty. While our discussion will focus on uncertainty about the level of demand, the same general insights and approach are applicable to uncertainty on the cost side as well. To focus our discussion, let’s think in terms of a very simple example. Consider a hot dog vendor on a beach. If the weather stays sunny, he will have a large lunchtime crowd. If a thunderstorm breaks out, he will have very few customers. Just before lunch time, he must put some hot dogs on to cook and buns on to warm. If more people show up than hotdogs he has prepared, they will leave and go elsewhere before he can cook more. If fewer people show up than hotdogs he has prepared, the extra hotdogs and buns are wasted. The problem is, he does not know if a thunderstorm will break out or not ‐ he has only a probability estimate. How should he determine how many hot dogs to prepare? To help us answer this question, let us first consider what the vendor would do if the uncertainty were resolved before he had to make a decision. In that case, he would find the optimal quantity and price for each possible outcome (good weather or poor weather), wait until he knew which outcome was going to occur for certain, and subsequently cook the corresponding number of hotdogs. Let pH (qH ) be the (inverse) demand curve if demand is high (good weather),

and pL (qL ) be demand if demand is low (poor weather), where qH is the quantity sold if demand is high and qL is the quantity sold if demand is low. Then, if demand turns out to be high, the vendor will maximize profits by choosing the quantity, qH ∗ , that satisfies MRH = MCH and will charge a price of pH (qH *) = pH * . Likewise, if demand turns out to be low, the manager will sell a quantity, qL * , which satisfies MRL = MC L and charge a price of pL (qL *) = pL * . Given relative probability assessments about the likelihood of each of these outcomes occurring, we can

calculate the firm’s expected profit from the point of view of someone evaluating the future before the state of nature is revealed. If Pr(H ) is the chance that demand is high, and Pr(L) the chance that demand is low, the firm’s expected profit is E (π ) = Pr( H ) ( pH * qH * −C ( qH *) ) + Pr( L) ( pL * qL * −C ( qL *) ) .

(3.9)

Example: Expected profit when uncertainty is resolved before production

Suppose a firm faces uncertain demand, where inverse demand for high demand q periods is pH = 20 − H 4 , and inverse demand for low demand periods is q pL = 10 − L . If the probability of high demand is 50%, and the firm’s marginal 4 cost is constant at $2, what is the firm’s expected profit if the uncertainty is resolved before production decisions are made? Solution: First, find the prices that the manager would set at each demand level. If demand is high, the profit‐maximizing condition is MRH = MCH , so

20 −

18 =

qH = 36 and the price that sells this quantity is

pH (36 ) = 20 − 36 4 = 11 .

If demand turns out to be high, profit is

π H = 11(36) − 2(36) = 324 .

For low demand period, the optimality condition is MRL = MCL , so

10 −

2 =2

2 qL = 16

and the price is

pL = 10 − 16 4 = 6 .

Profit in low demand periods, then, is

π H = 6(16) − 2(16) = 64 .

The firm’s expected profit is the probability of each outcome, times the profit it receives in that case, or

E(π ) = Pr(H)π H + Pr(L)π L E(π ) = 0.5(324) + 0.5(64) = 194

Now let’s return to the question of making a production decision before the uncertainty is resolved. We can use the answer to the question above as a benchmark to see how much uncertainty reduces the firm’s expected profit. One way to approach this situation is to choose a single quantity to produce and then set whatever price is necessary to sell that quantity, whether demand turns out to be high or low. You might think of it in terms of offering a price discount on rainy days large enough to sell everything you already prepared even if demand is low. It turns out this approach is not quite right (even though this is the solution presented in some managerial economics textbooks). But, working through it is instructive nonetheless, because, sometimes, it is right, and because we need to understand when and how it can go wrong so that we may see how to correct it. Since there’s only one quantity no matter the weather, qH = qL = q , and expected profit becomes

E (π ) = Pr( H ) pH (q )q + Pr( L) pL (q)q − C (q ) .

(3.10)

Maximizing, we obtain:

⎛ dp ⎞ ⎛ dp ⎞ dC dE (π ) = Pr( H ) ⎜ H q + pH ⎟ + Pr( L) ⎜ L q + pL ⎟ − = 0 . dq ⎝ dq ⎠ ⎝ dq ⎠ dq Note that the term multiplied by the probability of high demand is marginal revenue of demand if high, given output is q, and the term multiplied by the probability of low demand is marginal revenue if demand is low, evaluated at quantity q. So, this expression can be rewritten as:

Pr( H ) MRH (q) + Pr( L) MRL (q) = MC (q) .

(3.11)

The left‐hand side of this equation can be thought of as the expected marginal revenue; it’s the marginal revenue for each state of nature times the probability that each state of nature occurs. This represents the uncertainty and the fact that we must make our pricing decisions before demand is completely known. This expected marginal revenue is then balanced against marginal cost.

Example: Profit maximization with uncertainty when demand is known after production, choosing one quantity

Using the demand and cost conditions curves from the previous example, ( pH = 20 − 0.25qH , pL = 10 − 0.25qL , Pr(H)=0.5, and C(q)=2q), find the expected profit if the firm is unable to resolve uncertainty before choosing quantity. Solution: Since we’re only planning on selling one quantity, expected profit is

E(π ) = 0.5 (20 − 0.25q )q + 0.5 (10 − 0.25q )q − 2q .

Maximizing we obtain

dE(π ) = 0.5 (20 − 0.5q ) + 0.5 (10 − 0.5q ) − 2 = 0 dq 0.5 (20 − 0.5q ) + 0.5 (10 − 0.5q ) = 2

10 − 0.25q + 5 − 0.25q = 2 13 = 0.5q q = 26 If demand turns out high,

pH = 20 − 0.25(26) = 13.50

π H = (13.50 − 2)26 = 299

If demand turns out low,

pL = 10 − 0.25(26) = 3.50

π L = (3.50 − 2)26 = 39

So, expected profit is

E(π ) = 0.5(299) + 0.5(39) = 169 .

Note that this is lower than the expected profit when the firm was able to postpone the pricing decision until after the uncertainty was resolved. So, what is wrong with the approach above? It is based on a potentially faulty assumption. We assumed that we would lower price until we sold everything we had produced, even when demand was low. If demand is low, but not too low, so that the price to sell out is not too low, that might make sense. But, would the vendor always do that? In the extreme, suppose they had to lower price all the way to 0 to unload everything they had prepared if it was raining. Then, revenue would be 0. Absent outside influences not present in this example or model, no firm would ever want to sell more than maximized revenue. Further, the fact that we assumed the firm would

lower price enough to sell out when demand was low held down production, and, thus, the amount that could be sold if demand was high. To answer the question correctly, we must recognize that there is no reason to sell the same quantity when demand is low as when demand is high, and no reason we have to sell everything we produced. In some scenarios, we can store extra output, at some cost, as inventory. Inventory accumulation has future benefits because it saves on next period’s production costs. But, holding inventory has costs. With inventory, those benefits must be accounted for in the profit function before optimizing. If inventory is too expensive, it may make more sense to throw out unused output. In the hotdog vendor example, the food is perishable and must be disposed of if not used. Generally, there may be disposal costs associated with throwing output out. If so, they must be accounted for in the profit function. For the hotdog vendor example, it is sensible to assume free disposal – that it does not cost anything significant to throw some hot dogs and buns in the dumpster or to feed them to the dog when the vendor gets home. At this point, some student often raises the following objection. Why throw hotdogs out instead of getting at least something for them. Set a price that raises some revenue, sell what you can, and then lower the price to get rid of the rest. They have in mind something like what happens to unsold Halloween candy the day after Halloween. The problem with that line of thinking lies in a misconception of the definitions of the product and the market. Products must be thought of in terms of all of the characteristics needed to meet a customer’s intended use, including location and timing. Just like available land in Georgia does me no good if building a house in Gainesville, candy the day after Halloween does me no good if I need it to hand out to kids on Halloween. So, the store can lower the price the day after Halloween without too much effect on demand before Halloween. On the other hand, if they made a habit of marking down Halloween candy a few days before Halloween, all their customers would just wait until the markdown to buy. Similarly, if hotdog vendors sold at a high price for 45 minutes after the start of the lunch rush and then cut the price, many customers would anticipate this and wait for the price to be cut. So, that strategy is self defeating. Another way to view it is that this form of price cutting is an attempt at price discrimination – trying to charge a high price to those willing to pay it and a lower one to other customers. That only works if the vendor can tell customers who are willing to pay a high price apart from those who are not upfront and can get away with segmenting the market and offering discriminating prices. The bottom line is that, for a product with given physical characteristics at location and over a narrow time interval, unless there is a way to explicitly price discriminate, the ability of customers to wait for the price cut before purchasing forces the seller to pick a single point on the market demand curve and stick to it. So, let’s return to the hotdog vendor’s problem. They have to decide how many hot dogs to produce, q, how many to sell if demand is high, qH, and how many to sell

if demand is low, qL. That seems like three choices to make. And, in some sense, it is. But, we can simplify a lot by making a couple of simple observations. First, it would be a bad idea to make more than the most that the vendor would ever want to sell. Second, the vendor will never want to sell more when demand is low than they want to sell when demand is high. So, we know the level of production, and therefore cost, are determined by the high demand sales target. If we produce qH and demand turns out to be low, we have plenty of units on hand to meet that demand; if demand turns out to be high, we’ve produced exactly the right amount. Expected profit then becomes:

E (π) = Pr( H ) ( pH (qH ) ) qH + Pr( L) ( pL (qL ) ) qL − C (qH ) .

(3.12)

We are, quite reasonably, assuming that we would never want to sell more when demand is low than when demand is high. We therefore built our expression for expected profit on the assumption that qH ≥ qL . However, there is nothing in equation (3.12) to guarantee that when we maximize it, we will end up with qH ≥ qL . So, once we have maximized this expected profit, we will need to check that our solution actually satisfies that constraint. If it does not, we will have to go back to the drawing board. We will return to why this might happen mathematically and what to do about it later. For now, there are two choice variables in the vendor’s expected profit function. Maximizing with respect to qL gives:

∂ E ( π) = Pr( L) MRL = 0 . ∂ qL

(3.13)

From that, it follows that MRL = 0 . What about marginal cost? We are assuming we want to sell more when demand is high than when demand is low. Therefore, once it is time to actually sell the hotdogs, we will have produced more than we plan to sell if demand is low. So, if demand is low, we will have extra output just sitting around. The marginal cost of getting another unit to sell if demand is low is zero! So, if we actually want to sell more when demand is high than when demand is low, we just maximize revenue if demand turns out to be low. Maximizing with respect to qH we get

∂E (π) = Pr( H ) MR H − MC (q H ) = 0 . ∂q H

(3.14)

Notice how the marginal revenue at high demand periods is weighted by the probability we will experience high demand, whereas marginal cost is not weighted by any probability. This is because we will always produce qH, even though we may not actually end up selling all of it.

Example: Profit maximization with uncertainty when demand is known after production, and restricting low demand quantity

Using the previous demand curves ( pH = 20 − 0.25 qH and pL = 10 − 0.25 qL ), probability estimates, and constant marginal cost given, what is the firm’s maximum expected profit if it has to produce before demand is known? Solution: The firm should produce what it expects to sell if demand is high, and restrict its quantity sold if demand is low. So, it should maximize its expected profit:

E (π ) = 0.5 (20 − 0.25 qH )qH + 0.5 (10 − 0.25 qL )qL − 2(qH )

The quantity for high demand times is

∂E (π ) = 0.5 (20 − 0.5 qH ) − 2 = 0 ∂qH

20 − 0.5 qH = 4

qH = 32 and the quantity for low demand times is

(

)

∂E (π ) = 0.5 10 − 1 2 qL = 0 ∂qL

10 − 1 2 qL = 0

qL = 20

Since qH ≥ qL , this is a reasonable solution. Therefore, the expected profit is

E (π ) = 0.5 (20 − 0.25(32))(32) + 0.5 (10 − 0.25(20))(20) − 2(32) = 178

Notice that this profit is higher than when we sold the same quantity at both high and low demand periods, and lower than when the uncertainty was resolved before the production decision was made. It is instructive to compare the solutions implied by equations (3.13) and (3.14), for the case where uncertainty is not resolved before the production decision is made to the choices that would be made if complete information were available at the time of the production decision. The quantity to sell when demand is low equated marginal revenue and marginal cost with complete information, and maximizes marginal revenue with uncertainty. Thus, more is sold when demand is low with imperfect information. When demand is high with uncertainty, a fraction of marginal revenue equal to the probability of high demand is equated to marginal cost in determining production levels. Thus, sales are lower with high demand when

uncertainty exists at the time of production than they would be without the uncertainty. Basically, the presence of uncertainty is causing underproduction relative to the most profitable high demand output and overproduction relative to the most profitable low demand output. Some of that overproduction at low demand is sold and some is disposed of. The graph to the right shows $ both solutions: when uncertainty is resolved prior to p U H the production decision and pH* pH when it is not. Prices and quantities marked with a * represent the solution with MC complete information. The solution with uncertainty is MRH denoted with a superscript U. pL* Pr(H)MRH The graph mirrors the math and pLU discussion above. With certainty, production occurs where pL marginal cost crosses the U qH* Quantity relevant marginal revenue. With qL* qLU qH uncertainty, production occurs MRL where the probability weighted marginal revenue when demand is high crosses marginal cost, qHU, and price is correspondingly pHU. That means that the effective marginal cost is 0 at low demand, so sales if demand is low occurs where marginal revenue equals marginal cost, which is zero in this case, at qLU, and price is pLU. It is obvious from the graph that, as compared to certainty, the low demand quantity is higher, the low demand price is lower, the high demand quantity is lower, and the high demand price is higher. Since we are diverging from what would be chosen if we had complete information when the production decision was made, expected profit must be lower. The solution technique outlined above works fine so long as the solution for the quantity produced to sell when demand is high actually exceeds the solution for the quantity to sell when demand is low. If the solution does not satisfy that condition, it contradicts the assumption underlying our formulation of the problem – we will not have produced enough to sell more when demand is low than when demand is high. How might we end up in a situation where the solution contradicts the assumption that qH ≥ qL ? It can be seen readily in the figure above. As the marginal cost gets higher or the probability weighted high demand marginal revenue gets closer to the low demand marginal revenue, the difference between qHU and qLU gets smaller. So, if marginal cost or low demand are high enough, or if high demand or the probability of high demand are low enough, this solution technique will yield a solution where qH < qL .

What should we do in that case? We can never actually sell more than was produced. Further, we would never sell more when demand is low than when demand is high. Therefore, if qL is not less than qH, at most, qL will equal qH. So, if assuming that qH ≥ qL will be satisfied does not work, we must impose the constraint on the problem that qH = qL . In that case, everything produced is sold whether demand turns out to be high or low. Thus, this is the case in which the set up of equation (3.10) is correct, and the solution is that implied by equation (3.11). The situation and the solution are shown in the figure to the right. The probability weighted marginal revenue when demand is high equals marginal revenue at a quantity, qHX, is less than the quantity that maximizes revenue when demand is low, qLX. The solution is to equate the sum of the probability weighted marginal revenues with marginal cost, which occurs at the quantity labeled qU. If demand is low, the price charged is pLU, and, if demand is high, it is pHU.

pHU pH

MRH pLU

Pr(H)MRH

pL qHX

qU qLX

Quantity

MRL

Pr(H)MRH +Pr(H)MRL

Example: Profit maximization with uncertainty when demand is known after production, and restricting low demand quantity

Suppose pH = 20 − 0.25 qH , pL = 10 − 0.25 qL , C(q)=3q, and Pr(H)=0.2. How much should the firm sell, and what price should they charge at high and low demand? Solution: First, allow for the possibility that the quantity sold when demand is low is less than produced.

E (π) = 0.2 ( 20 − 0.25qH ) qH + 0.8 (10 − 0.25qL ) qL − 3qH

At high demand:

∂E (π) = 0.2 ( 20 − 0.5qH ) − 3 = 0 ∂qH

20 − 0.5qH = 15 qH = 10 . At low demand:

∂E (π) = 0.8 (10 − 0.5qL ) = 0 ∂qL

10 − 0.5qL = 0

qL = 20 .

Since qH < qL , this is the wrong approach to this problem. So, assume all output is sold whether demand is high or low and work the problem again.

E (π) = 0.2 ( 20 − 0.25q ) q + 0.8 (10 − 0.25q ) q − 3q

Maximizing:

∂E (π) = 0.2 ( 20 − 0.5q ) + 0.8 (10 − 0.5q ) − 3 = 0 ∂q

12 − 0.5qH = 3 qH = 18 .

Value of Information with Continuous Decisions In Chapter 1, in the context of discrete decisions, we saw how acquiring additional information can increase expected profit if it allows a decision maker to update their probability estimates. We can now revisit the question of information value in a context where the decision to be made is continuous – namely, what output to choose to maximize profit when there is some uncertainty about some determinant of profit. It is easy to determine the value of perfect information by comparing the expected profit that would result with perfect information to expected profit with whatever information is at hand. A much more interesting question is how valuable additional but still imperfect information is to a decision maker, since imperfect information is all they are likely to actually have access to. An example of imperfect information is a consultant’s report about the likelihood of success of a new product line. It is imperfect because, despite his expertise, the consultant is unable to make a 100% accurate assessment of the future. However, it is possible that the report can still be valuable to the firm because it (presumably) provides additional information that the firm does not have otherwise. We will consider information in the context of maximizing profit while facing uncertain demand. The effects of cost uncertainty would be treated in a similar fashion. As in Chapter 1, we assume the additional report results in a signal of either good news or bad news, where good news leads us to revise our initial assessment of the chance of high demand, Pr(H), upward to Pr(H|GN), and bad news leads us to revise it downward to Pr(H|BN). The more reliable we think the information is, the more it will affect our probability assessments and the more valuable it will be. The most objective possibility for using additional information to update occurs when we have a sample of reports from the same source for very similar situations in the past which we think forms a valid basis to make inferences about the future. This is illustrated in the example below. Generally, though, the process must involve some degree of subjective judgment on the part of the decision maker. Better decision makers are better at using all the information available to them, including their gut feel or intuition after meeting with the consultants providing the additional information, to formulate their probability estimates. Once we have the report, and a new assessment of the probability of high demand, the problem is just like the one in the previous section, choosing the quantity to sell at high demand and at low demand to maximize expected profit based on the new probability of high demand. The solution will be different with good news than with bad news, and both of those solutions will differ from the solution with no additional report. We now present an extended example to make these ideas more concrete. This is a continuation of the example from the previous section.

Example: Value of Imperfect Information Part 1 – Updating Probability Estimates

Suppose that a manager is uncertain about whether demand will be high or low, and is considering buying a forecast to improve his assessment of the future. The manager wants to determine how reliable a new report will be, and he has data from previous forecasts he has purchased. Forty percent of the time, the forecast was good news and demand was high and forty percent of the time the forecast was bad news and demand was low. This is represented in the table below. What is the probability the firm will actually experience high demand if it receives a good forecast? What is the probability that it will experience high demand if it receives a bad forecast? If the firm buys a new Demand forecast, what is the probability that it will return High Low good news? GN 0.4 0.1 Solution: Looking at the table, we see that 40% of Report the time the report gave good news and demand BN 0.1 0.4 turned out to be high. Also, we see that 50% of the time the report gave good news (0.4 + 0.1, the sum of the first row). Thus, the probability of a high demand period given a good report is

Pr( H | GN ) = 0.4

0.5

= 0.8

Similarly, the probability of high demand given bad news is

Pr( H | BN ) = 0.1

0.5

= 0.2

Finally, we see that 40% of the time the report gave good news when demand turned out to be high, and 10% of the time the report gave good news and demand turned out to be low; so, the probability that the report will return good news is

Pr(GN ) = 0.4 + 0.1 = 0.5

Part 2 ‐ Finding Expected Profit for Each Report

Continuing from the example in the previous section, a firm faces uncertain q demand where inverse demand for high demand periods is pH = 20 − H 4 , and q inverse demand for low demand periods is pL = 10 − H 4 . The firm’s marginal cost is constant at $2. What is the firm’s expected profit with good news and with bad news? Solution: From above, the probability of high demand with a good report is 0.8, which means the probability that demand will be low is 1 – 0.8 = 0.2. The expected cost is simply the cost of producing qH units, since (as described in the

previous section) we want to make sure we have enough inventory on hand to sell if demand turns out to be high. Thus, expected profit given good news is

q ⎞ q ⎞ ⎛ ⎛ E (π | GN ) = 0.8 ⎜ 20 − H ⎟ qH + 0.2 ⎜ 10 − L ⎟ qL − 2 qH , ⎝ ⎝ 4 ⎠ 4⎠

subject to the constraint qH ≥ qL . The constraint must hold, since we are associating all of our costs with the high demand period. Maximizing, we find the quantities as follows.

∂ E(π | GN ) q ⎞ ⎛ = 0.8 ⎜ 20 − H ⎟ − 2 = 0 ⎝ ∂ qH 2 ⎠

q ⎞ ⎛ 0.8 ⎜ 20 − H ⎟ = 2 ⎝ 2 ⎠

qH = 17.5 2 q H = 35

∂ E(π | GN ) q ⎞ ⎛ = 0.2 ⎜ 10 − L ⎟ = 0 ⎝ ∂ qL 2⎠

10 −

qL = 0 2 qL = 20

Since the constraint qH ≥ qL is satisfied, we can use these quantities to find expected profit given good news:

35 ⎞ 20 ⎞ ⎛ ⎛ E (π | GN ) = 0.8 ⎜ 20 − ⎟ 35 + 0.2 ⎜ 10 − ⎟ 20 − 2(35) = 265 ⎝ ⎝ 4⎠ 4⎠

Similarly, with bad news the probability of high demand is 0.2 and the probability of low demand 0.8. Thus, expected profit given bad news is

q ⎞ q ⎞ ⎛ ⎛ E (π | BN ) = 0.2 ⎜ 20 − H ⎟ qH + 0.8 ⎜ 10 − L ⎟ qL − 2 qH , ⎝ ⎝ 4 ⎠ 4⎠

subject to the constraint qH ≥ qL . We find the quantities as follows.

∂ E(π | BN ) q ⎞ ⎛ = 0.2 ⎜ 20 − H ⎟ − 2 = 0 ⎝ ∂ qH 2 ⎠

20 −

qH = 10 2 q H = 20

∂ E(π | BN ) q ⎞ ⎛ = 0.8 ⎜ 10 − L ⎟ = 0 ⎝ ∂ qL 2⎠

10 −

qL = 20 .

qL = 0 2

Since the constraint is met, we can plug these quantities back into our profit function to find the expected profit given bad news.

20 ⎞ 20 ⎞ ⎛ ⎛ E (π | BN ) = 0.2 ⎜ 20 − ⎟ 20 + 0.8 ⎜ 10 − ⎟ 20 − 2(20) = 100 ⎝ ⎝ 4⎠ 4⎠

How then do we determine the value of the report? It is the difference in expected profit with and without the report. In turn, expected profit with the report is just the probability of good news, Pr(GN), times expected profit with good news, E (π | GN ) , plus the probability of bad news Pr(BN), times expected profit with bad news, E (π | BN ) . Letting E (π | Info) represent profit with the additional information, this is

E (π | Info) = Pr(GN ) E (π | GN ) + Pr( BN ) E (π | BN ) .

Letting E (π | NoInfo) represent expected profit with no information, the value of information is Value = E(π | Info) − E (π | NoInfo) .

Thus, once we have determined the maximum expected profit for each possible probability of high demand, the value of information is determined exactly as in Chapter 1. Example: Value of Imperfect Information Part 3 ‐ Finding the Value of the Information

Using the data from the previous examples, calculate the expected profit of the firm if they obtain a report. Then, determine the value of the information. Solution: We know the report has a 50% chance to return good news (which means it also has a 50% chance to return bad news). We also know that the firm’s expected profit if they buy a report and receive good news is 265, whereas the expected profit if the report returns bad news is 100. Thus, the firm’s expected profit if they buy a report is

E(π | Info) = 0.5(265) + 0.5(100) = 182.50

The value of the information is how much higher the firm expects profits to be with it than without it. From the examples in the previous section, we know the

highest profit the firm can expect to receive without any additional information is 178. Thus, the value of the information is

Value of Info = 182.50 − 178 = 4.50 .

The decision of whether or not to buy additional information can be illustrated using a decision tree: Don’t Buy = 178 0.5

E(π)=178

Good News

E(π)=265

Bad News

E(π)=100

Buy = 182.50 0.5

If the firm doesn’t buy the information, they will expect to make 178 in profit; if they buy it, they will expect to make 182.50. So, they will buy the additional information if its cost is less than 4.50. In this sort of problem, the more the report causes you to upgrade your assessment of the chance of high demand, the more you will produce, and the less likely it is that you will sell everything produced if demand turns out to be low, the more likely it is that you will sell the quantity that sets marginal revenue equal to zero and dispose of the rest if demand is low. A corollary is that if you get a favorable signal but demand turns out to be low and you still sell everything produced (that is you do not set marginal revenue equal to zero), you will of course sell everything produced when demand is low if you did not buy the report or if you received an unfavorable signal, because you will have produced less. Similarly, the more the report causes you to downgrade your assessment of the chance of high demand, the less you will produce, and the more likely it is that you will sell everything produced if demand turns out to be low. A corollary is that if you get an unfavorable signal before production but you do not still sell everything produced if demand then turns out to be low, you will of course not sell everything produced even when demand is low if you did not buy the report or if you received an unfavorable signal – you will sell the quantity that sets marginal revenue equal to zero with low demand in all cases. In summary, information that is imperfect is still valuable. The point of a forecast or a consultant’s report is not to give you a definitive summary of what will happen in the future – no one can do that. Instead, a forecast is valuable to the extent that it “improves” your best guesses about the probabilities of future events and therefore changes your choice of production level.

Chapter 3 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. 3rd Degree Price Discrimination Charging separate groups different prices for a product based on their willingness to pay for that product. The firm must be able to identify and separate these groups of customers, it must be feasible to charge different prices, and resale must not be possible. PeakLoad Pricing A technique of charging different prices based upon when demand differs by time of day or season. At the time the firm experiences its peak demand, customers have a higher willingness to pay and thus, it can charge higher prices but must incur a per‐unit capacity cost. At off‐peak demand, the firm will set prices lower in relation to the lower willingness to pay at that time, but will not incur any capacity cost.

Part 2 Empirical Approximations and Econometrics

Chapter 4 Estimating and Interpreting Approximations Economics is a set of tools that, among other things, allows us to make inferences about the consequences of various actions. These inferences are based on models rooted in economic theory and tested against real‐world data. Because economics deals with constantly changing individuals and markets, complete or perfect data is usually unrealistic; but, as we saw in Chapter 1, inferences can be improved through supplemental information, even if imperfect. One way that we can model economic phenomena is to approximate empirical relationships among historical data, using economic theory as a guide. For example, we can estimate a firm’s demand curve by collecting data on different prices charged and the resulting quantities sold over the past several years, and then attempt to describe this relationship using an equation. Once we have a function that potentially describes a certain relationship between two variables, we can test this function’s reliability by comparing predictions of the model to external data that wasn’t used in constructing the function. To illustrate how these models can be useful, let’s look at the following scenario. Suppose Little and Small Inc. currently sells 60 units at a price of $4 per unit. They then lower their price to $3, and as a result sell 70 units. Their cost function is C = 5 + 2 q . What should the manager set price to in order to maximize profit? First, let’s look at Little and Small Inc.’s profit at each price. At $4, profit is

π = 4 ⋅ 60 − 5 − 2 ⋅ 60 = 115 and at $3 profit is

π = 3 ⋅ 70 − 5 − 2 ⋅ 70 = 65 So, they’d be better off at a price of $4 than at $3. Can they raise profits even more by charging a different price? If so, would that price be higher than $4, or somewhere between $3 and $4? The graph to the right illustrates why there is not an obvious answer to this question.

Max Profit?

In order to see what the profit hill looks like, we need a demand approximation. In Chapter 2, we introduced linear and log‐linear approximations P>$4 P<$4 and how they can be used to represent a firm’s demand curve. The coefficients of the variables in these approximations are what actually determine 60 70 the relationship between these variables ‐ but how are they found? In the next two sections, we will see how we can estimate these coefficients (parameters) given two points of data. From there, we generalize to estimating the coefficients when we have many data points.

Fitting a Linear Demand Approximation with 2 Points Recall the main assumption when using a linear demand approximation is that the slope of the demand curve is constant. Basically, given any two observed points (p0,q0) and (p1,q1) on the demand curve, we are simply looking for the line that p passes through them. This is Approximation illustrated in the figure to the right. The first step in deriving a linear expression p0 Observed for our two data points is to calculate the slope between the two points; we can then p1 use this to define the approximation using either p(q) original data point. Once we have an approximation of the demand curve, we can q q1 q0 calculate marginal revenue and get an estimate the profit maximizing price and quantity. Example: Fitting and Using a Linear Approximation with Two Points

Using Little and Small Inc.’s two observed data points, (60, $4) and (70, $3), estimate an inverse linear demand curve for the firm. Solution: First, we need to find the slope of the inverse demand curve, which we are assuming constant. The slope is Δp/Δq, or

4−3 = −0.1 60 − 70

The slope tells us that from any starting point, the change in price is ‐0.1 times the change in quantity, or

Δp = −0.1Δq . If we choose a price of 4 and a quantity of 60 as the starting point, and then move along the demand curve to any other quantity, q, and the resulting price, p, thus becomes

( p − 4 ) = −0.1( q − 60 ) . This can easily be rearranged to express the inverse demand relationship more concisely as follows.

p = 4 − 0.1(q − 60) p = 4 − 0.1q + 6 p = 10 − 0.1q Given the two points we observed, we assumed the slope of the curve was constant and found the line that passed through the points, as illustrated in the figure to the right. Given the linear demand function found in the previous example and a cost function of C = 5 + 2q , find the optimal price and associated profit. Solution: Set up the profit function, and maximize. Profit is

π = (10 − 0.1q)q − 5 − 2q . Maximizing it:

dπ = 10 − 0.2q − 2 = 0 dq 8 − 0.2q = 0 0.2q = 8 q = 40 To find the price, plug this quantity back into the inverse demand approximation.

p = 10 − 0.1(40) = 6 At $6, profit is

π = 6(40) − 5 − 2(40) = 155 Based on the example, it seems that we could conclude it would profit the firm to charge a price of $6. However, this estimate of the profit‐maximizing price was based on a very limited amount of data. Our major assumption when determining the profit‐maximizing price was that if price falls by 0.10, quantity would increase by 1, and that this relationship held at all prices. In fact, we’ve only observed data for prices around $3 and $4; we really don’t have any idea what will happen if we charge $6. This common mistake is called extrapolating beyond the data range and we will revisit this in detail later in the chapter. So, what our model really suggests is that the profit‐maximizing price may well be higher than $4, but there simply is not enough information to say with any certainty that it should, in fact, be $6.

Fitting a LogLinear Demand Approximation with 2 Points By using a log‐linear model to approximate demand, we are assuming the elasticity is the same at any price. From Chapter 2, the general formula for a log‐ linear approximation is

qD = ap b

(4.1)

where a is a coefficient that captures the scale of demand and b is the price elasticity of demand. With two points on the demand curve, (p0,q0) and (p1,q1), we can solve for the unknown parameters (a and b) that make the approximation go through the two points. This is p Approximation illustrated in the figure to the right. The first step in deriving a log‐linear approximation through our two data p0 Observed points is to calculate the elasticity; we can then use this to define the approximation using either original data point. Once we have an p1 p(q) approximation of the demand curve, we can calculate marginal revenue and get an estimate the profit q0 q q1 maximizing price and quantity. Example: Log‐Linear Demand: Finding Demand

Using Little and Small Inc.’s two observed data points, (60, $4) and (70, $3), estimate a log‐linear demand approximation. Solution: First, we need to find the elasticity given these two points, assuming it is constant. We can plug both points into equation (4.1).

60 = a 4η 70 = a3η Since a and η are both constants, we can solve this system of equations for the two unknowns, a and η. Dividing the left‐hand side of the first by the left‐hand side of the second and similarly for the right‐hand side, then solving, we obtain the following.

60 a 4η = 70 a3η

6 ⎛4⎞ =⎜ ⎟ 7 ⎝3⎠

⎛6⎞ ⎛4⎞ ln ⎜ ⎟ = η ln ⎜ ⎟ ⎝7⎠ ⎝3⎠

η=

ln ( 6 7 ) ln ( 4 3)

= −0.5358

To find a, plug η back into either equation and solve:

60 = a 4−0.5358

a = 126.11

So, our log‐linear demand approximation is

q = 126.11 p −0.5358

The graph below shows the log‐linear demand curve from the example. Elasticity of demand is constant along the entire curve but slope depends on what price is being charged. At this point, since we’re assuming elasticity is constant, and since marginal cost was constant at 2 ( C = 5 + 2q ), we can use the expression for the profit‐maximizing mark up from Chapter 2.

⎛ η ⎞ p* = ⎜ ⎟ MC ⎝ 1 +η ⎠ ⎛ −0.536 ⎞ p* = ⎜ ⎟ 2 = −2.31 ⎝ 1 − 0.536 ⎠ This answer is absurd! Recall from Chapter 2 that when elasticity is less than 1 in absolute value, raising price increases profit. So, the log linear approximation tells us only to raise price – not how far to raise it! The technical reason we get an absurd answer for price in the above example is that the second order conditions fail at the solution implied by our mark up formula. Intuitively, though, it should make sense. Remember, we are assuming constant elasticity, and we found demand to be inelastic (elasticity less than 1 in absolute value). What does it mean for demand to be inelastic? Inelastic demand means for a 10% increase in price, quantity will decrease by less than 10%. Revenue will increase and cost will decrease ‐ so it’s profitable to raise price. Since elasticity is assumed constant, it will always be profitable to raise price; so the manager should theoretically charge a price of infinity! We know that constant elasticity was just an assumption used for our approximation. It is not literally true. In fact, demand tends to be inelastic at low prices and elastic at high prices. In using it, we hope only that over a relatively small

range of prices we are likely to charge that elasticity is relatively constant, so that the approximation is relatively accurate. However, if the price range we have is so low that we find demand is inelastic, the calculus of profit maximization, literally interpreted, suggests we should raise price A LOT, until demand is elastic. That large change in price means assuming elasticity is constant is not reasonable for the problem at hand. If we were to raise prices to a higher level and collect data again, the approximation would become useful if the range for which we had data included prices near the profit‐maximizing price. Thus, again, it boils down to saying we should not use an approximation to extrapolate much beyond the range of observed data. An additional word of caution is in order here before moving on. Time is a very important determinant of the elasticity of demand. If gas prices were to increase permanently by 100%, initially, consumer response would be relatively restrained. People have to get back and forth to work in the cars they already have, they have already scheduled appointments and trips, etc. Given more time to adjust, they can move closer to work, get a job closer to home, buy a more fuel efficient car, etc. In short, the substitution possibilities are higher in the long run than in the short run. A finding that demand is inelastic in the short run is no reason to believe it is inelastic in the long run. Before deciding that an inelastic response to a price increase means more price increases are in order, make sure that the long run response, not just the very short run response, is inelastic. If you raise price because short run demand is inelastic without realizing demand is more elastic in the long run, once consumers go through the trouble of finding substitutes for your product, you may not get them back if you lower prices later on.

Regression – Fitting the Best Approximation with Many Data Points Above, we took two points of data and fit a curve through them, attempting to describe a relationship between two variables. The general shape of the curve was determined by our assumptions about what would make a reasonable approximation. The exact position of the curve was determined by the two data points – we chose the coefficients or parameters of the approximation so the curve would precisely fit the observed data by going through both points. With many data points, it is not possible to fit the approximation precisely to all the data points. Instead, we want to choose the coefficients of the approximation to fit the observed data as closely as possible. Econometrics is the use of quantitative mathematical and statistical techniques to study economic phenomena. Regression analysis is the branch of econometrics concerned with fitting and evaluating empirical approximations of underlying economic relationships. To make things concrete, let’s consider an example. Suppose we want to know what determines variation in consumer electric bills. If the utility has only one price per kilowatt hour, the expenditure per household is just the price per kilowatt hour times the number of kilowatt hours consumed. Sometimes electric utilities have more complex price structures, where the price per unit increases or decreases with

the consumption level. Either way, suppose that the electric utility has held the price structure constant, and what we want to explain is what else drives electricity spending. One thing that comes immediately to mind is the size of the residence. The table to the right shows twenty (hypothetical) observations on the size of a residence and the electric bill incurred at that residence. We might think a very reasonable place to start is by plotting the data to see if the relationship looks linear or log linear. This is done in the figure below. Simple visual inspection reveals two things. First the data may well follow a general linear positive relationship between residence size and electricity expenditure. Second, no single line is going to fit with perfect precision. So, how do we choose an approximation to fit as closely as possible?

Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bill ($) 343.32 299.21 302.02 167.94 209.55 367.80 390.06 398.46 224.87 313.27 209.36 355.15 344.13 453.55 184.73 372.66 264.10 325.37 204.54 205.89

Size (1000's Sq Ft) 3.2 3.4 1.6 1.2 2.7 3.1 2.5 3.2 1.6 3 1.6 3.1 1.7 3.7 1.2 3.8 2.2 2.6 1.8 2.2

To begin to answer that question, we have to introduce the notion of an approximation error. What we have in mind is that the bill is a linear function of size, plus some component that we are unable to model or explain which we take as random for our purposes. Thus, we are positing the following relationship:

Billi = β 0 + β1Sizei + ε i .

(4.2)

Use of the lower case Greek beta (β) to represent regression coefficients is ubiquitous in applied regression analysis. β0, or “beta zero”, is the intercept and β1, or “beta one”, is the slope. The subscript i refers to the observation and takes on any value from 1 (the first observation) to N (the last), where N is the total number of observations. The lowercase Greek epsilon (ε) represents the (hopefully small) random component outside the model. We never observe the true values of the coefficients. Instead, we estimate them. We denote the estimates by placing a caret, or a “hat”, over the betas. So βˆ1 , or beta one hat, is the estimated value of β1 . We also do not observe the value of the random component, by definition. Instead, we observe an approximation error, which reflects both that there is a random component we do not observe and the fact that

we have only estimates of the coefficients. The approximation error is the difference between the actual electric bill and the one we would predict based on our imperfectly estimated coefficients. That is, the predicted bill is

Billi = βˆ0 + βˆ1Sizei

(4.3)

and the approximation error, or residual, is

εˆi = Billi − Billi = Billi − βˆ0 − βˆ1Sizei .

(4.4)

There are an infinite number of slope coefficient and intercepts that might be chosen to approximate the data. Two possibilities, the lines labeled A and B, are shown in the figure below. Our goal is to choose values for the estimated coefficients so as to make the errors as small as possible overall. How to do that is the question. We could literally add up the errors and minimize their total. But, there is a serious problem with that approach since positive and negative errors cancel each other out. For example, if one point has an error of 10 and another point has an error of ‐ 10, their sum cancels to 0, even though the total error is 20 in absolute value. So, really bad approximations might have a total error of 0 using this flawed approach.

The most common regression technique is called Ordinary Least Squares (OLS). The goal of an OLS regression is to produce a line (or curve) as close to the data points as possible by minimizing the sum of the squared errors, which are always positive. Squaring each error, the sum of squared errors, or SSE, is

SSE = ∑ Errori2 = ∑ (Billi − βˆ 0 − βˆ1Sizei ) 2 . i

(4.5)

Choosing the coefficient estimates to minimize the SSE is a straightforward exercise in basic differential calculus and algebra. There is nothing magic about it, nothing that even requires a high level of expertise to understand. We simply take the partial derivative with respect to each of the coefficients and set them equal to zero:

∂SSE = −2∑ Billi − βˆ0 − βˆ1Sizei = 0 ∂βˆ0 i

(

)

∂SSE = −2∑ Sizei Billi − βˆ0 − βˆ1Sizei = 0 ∂βˆ1 i

(

)

(4.6)

Substitution shows that this could also be written as

∂SSE = −2∑ εˆi = 0 ∂βˆ0 i ∂SSE = −2∑ Sizeiεˆi = 0 ∂βˆ1 i

(4.7)

Thus, the two equations show that the sum of the approximation errors is 0 and that the sum of the product of the approximation errors and residence size is also zero. Of course, actually solving these two equations for the two unknown parameters is messy, and much better done by computer. But, it is important to understand what the computer is doing when it calculates regression coefficients. It is solving what is in principle a simple calculus problem that anyone who passed survey of calculus should understand. It is just that the computations themselves get messy enough that it makes much more sense to have them performed by a statistical software package. Performing ordinary least squares regression on the example we have been working yields the results shown in the table below. This particular output is from Microsoft Excel, but most spreadsheet packages will include a regression feature. For more involved work, statistical packages, such as STATA, are easier to use. But, all provide substantially the same information. While we will eventually make sense of all the information in the regression output, for now, lets focus on the estimated coefficients, which are highlighted.

Regression Statistics Multiple R 0.7501 R Square 0.5627 Adjusted R Square 0.5384 Standard Error 56.37 Observations 20 ANOVA df SS MS F Pvalue Regression 1 73580 73580 23.16 0.0001 Residual 18 57190 3177 Total 19 130770 Intercept Sq Ft

Lower Upper Coef Std Err t Stat Pvalue 95% 95% 111.27 40.56 2.74 0.01 26.06 196.49 75.11 15.61 4.81 0.00 42.32 107.90

The estimated coefficients yield the following expression for the predicted electric bill:

Bill = 111.27 + 75.11Size .

(4.8)

The interpretation is that every additional one thousand square feet of space is associated with a predicted increase of $75.11 in the electric bill. Taken literally, the intercept means a home with zero square feet would incur a bill of $111.27. Hopefully, we can all agree that that would be taking the model too literally. All it really means is that for common house sizes, the electric bill is best approximated by adding 111.27 to a charge of $75.11 per thousand square foot. The resulting fitted line, or approximation, is shown in the figure below.

Of course, many things other than the size of a residence have important effects on electric bills, just as more than quantity affects cost and more than price affects demand. Generally, we want our approximations to allow for the effects of many variables. So, before we can add to our understanding of regression analysis, it is necessary to introduce some terminology and notation to let us talk about approximations involving many explanatory or independent variables in a very general way. As we have noted before, it is conventional to let Y denote the dependent variable. We will also let X1, X2, … XK denote K different independent variables, where a subscript ki may be used to index the kth dependent variable for observation i (k takes on values between 1 and K, and i takes on values between 1 and N). A regression, then, is attempting to explain or approximate Y using X1, X2, … XK. For this reason, we will often refer to Y as an endogenous variable since it is determined within the regression, and X1, X2, … Xk as exogenous variables since they are determined outside of the regression. This is the ideal case. As their labels suggest, it is important for the independent variables to be truly exogenous; otherwise, the reliability of the regression is compromised. In economics, it is very difficult to determine which variables actually are exogenous and which are endogenous. In fact, it is often the case that two variables (such as price and quantity) influence each other and are therefore both endogenous. This problem is so significant that it requires an entire chapter to address it properly; for now, we will mostly ignore it and assume the variable on the left‐hand side of the equation is endogenous and the variables on the right‐hand side are exogenous. With this notation, the general form for a linear regression model is

Yi = β 0 + β1 X 1i + β 2 X 2i + ... + β K X Ki + ε i .

(4.9)

If we define a new explanatory variable, X0, that is equal to 1 for every observation, this can be written more compactly as

Yi = ∑ β k X ki + ε i .

(4.10)

The betas, β0, β1, etc., are the parameters we are trying to estimate. In order to approximate the parameters, regression analysis simply assumes that the equation we’ve described is a good approximation, and then finds the line that is closest to all of the points in the dataset. No regression is perfect. Theoretically, there exists an equation that will completely describe our dependent variable without error. By estimating the parameters in our approximation, we are attempting to recreate the “true” equation as best as possible. Since we will never be perfectly accurate, we represent our estimated betas by placing carets, or hats, over them. The predicted value of the dependent variable, Y‐hat, is determined by the estimated coefficients:

100

Yˆi = βˆ0 + βˆ1 X 1 + βˆ2 X 2 + ... + βˆK X K = ∑ βˆk X k .

(4.11)

The regression error (approximation error) is the difference between the actual and predicted values of the dependent variables. So, the sum of squares errors (SSE) is n

(

SSE = ∑ Yi − Yˆi

i =1

)

(4.12)

This is equivalent to

(

SSE = ∑ Yi − βˆ0 − βˆ1 X 1i − βˆ2 − X 2i − ... − βˆK X Ki i =1

)

(4.13)

or to 2

K ⎛ ⎞ SSE = ∑ ⎜ Yi − ∑ βˆk X ki ⎟ . i =1 ⎝ k =0 ⎠ n

In interpreting this, remember Yi is the actual observation from the ith trial and Yˆi is the predicted observation based on our estimated parameters. In short, our dependent variables (Y’s) and independent variables (X1’s, X2’s, etc.) are our observed data, and the parameters (β0, β1, etc.) are the unknown quantities that we are trying to estimate. To minimize SSE, simply set the partial derivative equal to zero for each parameter: ∂SSE ∂SSE ∂SSE = 0, = 0,..., = 0 . ∂βˆ0 ∂βˆ1 ∂βˆK

(4.14)

For the intercept, taking the derivative of equation (4.13) yields:

n ∂SSE = −2∑ Yi − βˆ0 − βˆ1 X 1i − βˆ2 − X 2i − ... − βˆK X Ki = 0 . ∂βˆ0 i =1

(

)

(4.15)

Substituting, this is just: n

−2∑ εˆi = 0 i =1

∑ εˆ i =1

(4.16)

For any of the other k parameters, taking the derivative of equation (4.13) yields:

n ∂SSE = −2∑ X ki Yi − βˆ0 − βˆ1 X 1i − βˆ2 − X 2i − ... − βˆK X Ki = 0 . ∂βˆk i =1

(

)

Substituting, this is just:

101

(4.17)

−2∑ X kiεˆi = 0 i =1 n

∑X i =1

(4.18)

εˆ = 0

ki i

This gives K+1 equations to solve for the K+1 unknown coefficients. The K+1 equations are that the sum of the approximation errors are zero, and that the K sums of the product of each independent variable and the approximation error are all individually 0. These K+1 equations are referred to as the normal equations. There is now even more reason to use computers to perform the actual calculations compared to our one independent variable example above, but there is nothing mysterious, or even complicated, about the idea involved. The most important assumption here is that all of the explanatory variables are exogenous and are completely uncorrelated with the unobserved random error component. If that is not the case, the calculation can not sort out the effect of the included variables from the effects of the random error with which they are correlated. This is known as omitted variables bias. While we will touch on this lightly later in this chapter and the next, for the most part we will simply assume the explanatory variables are exogenous and uncorrelated with the Size random error component. A thorough discussion of Obs Bill ($) (1000s) Temp 1 343.32 3.2 63 omitted variables bias and what can be done about 2 299.21 3.4 51 it must wait until Chapter 6. To make this more concrete, lets return to the electric bill example and add another explanatory variable. The table to the right now includes the monthly average temperature. It makes sense that, when temperatures are higher, electricity used for air conditioning will be higher than when temperatures are moderate. Of course, for houses with electric heat, expenses will be higher at colder temperatures too. That could cause the relationship to be U shaped, not linear. However, most of our data reflects moderate to warm temperatures; so, a positive linear relationship seems likely. Running an OLS regression on this data yields the following results:

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bill = −43.66 + 64.07 Size + 2.48Temp .

302.02 167.94 209.55 367.80 390.06 398.46 224.87 313.27 209.36 355.15 344.13 453.55 184.73 372.66 264.10 325.37 204.54 205.89

1.6 1.2 2.7 3.1 2.5 3.2 1.6 3 1.6 3.1 1.7 3.7 1.2 3.8 2.2 2.6 1.8 2.2

99 60 45 95 94 97 64 51 54 81 91 99 72 82 88 74 45 64

(4.19)

The interpretation is that the electric bill increases by $64.07 per thousand square feet, holding temperature constant, and by $2.48 per degree of temperature, holding size constant. The literal interpretation of the intercept is that the bill would be a negative $43.66 for a house with zero square feet if the temperature were zero degrees. Of course, that is nonsense. All it really means is that for normal

102

temperatures and sizes, you should subtract $43.66 from the sum of 64.07 times size in thousands of square feet and 2.48 times temperature to get the best fit to the data. Notice how the additional data caused both the intercept and the coefficient on Size to change as compared to the initial estimates from the regression when only size was included. Why would that be? In part, it may be because of a positive correlation, or co‐linearity, between size and temperature in our data. For some reason, houses tend to be larger where temperatures are higher. When temperature is left out, some of the effect of temperature is picked up by the coefficient on size, causing it to be higher when temperature is left out. More generally, both Size and Temp may be correlated with other variables that have been omitted from the model and lumped into the error component but which nonetheless influence the electric bill. Such omitted variables may cause the coefficients on the included variables to be biased if they are correlated with included variables and have their own direct affect on the dependent variable, because the included variables proxy the effects of the omitted variables in addition to the effects of the included variables. Thus, including variables that had been omitted can change the coefficients on all the variables that had previously been included. Since the omitted variables are often unknown, so are their correlations with the included variables. So, the coefficients on the included variables can be biased in unpredictable ways if the included variables are correlated with omitted variables. That is why the assumption that the included variables are exogenous and uncorrelated with error component is so important. Unfortunately, it never holds exactly. All we can hope for is that the included variables are not too correlated with variables that have been omitted, so that the bias in the estimated coefficients is not too large.

Interactions What if we want to model an interactive relationship between two (or more) of the independent variables themselves? For example, in the above regression, we estimated that both house size and temperature positively affect the size of the bill. It would seem that if it takes more energy to cool a house when temperature rises, that that additional cost would be higher still in larger houses. If we believed that the effect of temperature was larger in bigger houses, we would need an interaction term. An interaction term is a new independent variable in a regression that is simply the product of two or more other independent variables. So, if we wanted to capture the interactive effects of house size and temperature, our regression model would become

Bill = β 0 + β1Size + β 2Temp + β3 Size × Temp + ε i

(4.20)

This new model adds the product of Size and Temp as the new variable, Size × Temp . To see the effects of this term, we can look at the partial derivative with respect to Temp (holding Size constant)

103

∂Bill = β 2 + β3 Size ∂Temp

(4.21)

Including the interaction introduces a kind of non‐linearity into the model, in that the effect of temperature (and size) on the electric bill are no longer constant. As long as β3 is positive, the effect of temperature on the electric bill is greater as house size becomes larger. OLS is a kind of linear regression. So, it may occur to you to wonder if we can actually estimate this model that has some sort of non‐linearity with a linear regression model. Well, we can. Linear regression models must be linear in the unknown values to be estimated – which are the coefficients. They do not need to be linear in the observed variables. Indeed, we can make any sort of transformation we would like to the underlying data. Once that is done, the independent and dependent variables become constants from the viewpoint of estimating the coefficients, which is done through solving a set of linear equations – that is, linear in the unknown coefficients. The table below displays the dataset with our new variable, which is simply the product of Size and Temp. Running a new regression on this data, we obtain the following:

Billi = −.82.51i + 79.57 Sizei + 3.01Tempi − 0.21Sizei × Tempi .

The interpretation is that the effect of increased temperature is 0.21 lower for every 1,000 square foot increase in size. This is contrary to our expectations. What might explain this? Most likely, some sort of correlation between the new variable and something omitted from our model. One obvious candidate would be the age of the structure. Newer structures are both larger and better insulated, on average. Correlation between size and unmeasured insulation quality might overwhelm the tendency of cooling costs to rise faster with temperature in larger residences. If we included a measure of residence age or insulation quality, we might find a positive effect of the interaction between size and temperature.

104

Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bill ($) 343.32 299.21 302.02 167.94 209.55 367.80 390.06 398.46 224.87 313.27 209.36 355.15 344.13 453.55 184.73 372.66 264.10 325.37 204.54 205.89

Size (1000s) 3.2 3.4 1.6 1.2 2.7 3.1 2.5 3.2 1.6 3 1.6 3.1 1.7 3.7 1.2 3.8 2.2 2.6 1.8 2.2

(4.22) Temp 63 51 99 60 45 95 94 97 64 51 54 81 91 99 72 82 88 74 45 64

Size X Temp 201.6 173.4 158.4 72 121.5 294.5 235 310.4 102.4 153 86.4 251.1 154.7 366.3 86.4 311.6 193.6 192.4 81 140.8

Categorical or Dummy Variables Whereas the variables we’ve considered thus far have all been continuous, categorical, or dummy, variables are discrete. They allow us to capture whether certain discrete criteria are met or not. Examples include the effects of things such as race, gender, or holding an advanced degree. Dummy variables take on a value of 1 if a condition is true, and are 0 otherwise. In a way, this is another kind of non‐ linearity in that the effect of these variables on the dependent variable is not gradual, it is either all there or not there at all. The number of dummy variables required to represent a categorical classification is one less than the number of possible categories. To capture the sex of a consumer, we would need one variable. It could be called Male and take on a value 1 if the subject were male and 0 otherwise, or it could be called Female and take on the value 1 if the subject is female and 0 otherwise. Both variables, though, are not needed. If the subject is female, we know they are not male. If they are not female, we know they are male. For a more complex example, suppose we have data that classify highest degree earned into: 1) less than a high school diploma, 2) a high school diploma, 3) an Associates Degree, 4) a Bachelor’s degree, and 5) higher than a Bachelor’s degree. We would need to introduce four binary variables to represent these five possible categories. Suppose we chose to “omit” the category “a high school diploma”. Anytime all of the other variables were 0, we would know that that observation corresponded to someone who had completed high school but had not completed any college level degrees. To return to our electric bill example, we might reasonably suspect the presence of a swimming pool to affect the electric bill, due to the need to run a pool pump. We could add this to the regression with the inclusion of a categorical variable, Pool, which takes on the value of 1 if a pool is present at the residence and 0 otherwise. The model would then become

Billi = β 0 + β1Sizei + β 2Tempi + β3 Size × Tempi + β 4 Pool + ε i .

(4.23)

The coefficient on Pool, β 4 , represents the addition to the bill when a pool is present. When no pool is present, Pool is 0, so there is no addition to the bill. Thus, our estimate for β 4 represents the average change in the electric bill due to the presence of a pool. The table below shows the new dataset. Running this new regression produces the following result:

Bill i = 59.64 + 20.12Sizei + 1.62Tempi + 0.15Sizei × Tempi + 74.93Pooli . (4.24)

On average, the presence of a pool adds $74.93 to the electric bill.

105

Note the changes in the other parameter values. In particular, the coefficients on Size and Temp are much smaller and the coefficient on the interaction of Size and Temp is now positive, at 0.15. So, controlling for the presence of a pool, the effect of a one degree increase in temperature on the electric bill is 0.15 higher per thousand square foot increase in the size of the residence. So, for example, an increase in temperature from 60 degrees to 90 degrees would increase the electric bill by 0.15 times 30, or 4.5, more in a 2,500 square foot home than in a 1,500 square foot home.

Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bill ($) 343.32 299.21 302.02 167.94 209.55 367.80 390.06 398.46 224.87 313.27 209.36 355.15 344.13 453.55 184.73 372.66 264.10 325.37 204.54 205.89

Size (1000s) 3.2 3.4 1.6 1.2 2.7 3.1 2.5 3.2 1.6 3 1.6 3.1 1.7 3.7 1.2 3.8 2.2 2.6 1.8 2.2

Temp 63 51 99 60 45 95 94 97 64 51 54 81 91 99 72 82 88 74 45 64

Size X Temp 201.6 173.4 158.4 72 121.5 294.5 235 310.4 102.4 153 86.4 251.1 154.7 366.3 86.4 311.6 193.6 192.4 81 140.8

Pool 1 1 0 0 0 1 1 1 0 1 0 1 1 1 0 1 0 1 0 0

What caused these changes when we introduced the variable Pool? First, there must have been some correlation in multiple dimensions, known as multicolinearity, between Size, Temperature, and Pool in this data. Checking this out, you would find a very strong positive correlation between size and the presence of a pool, and a weaker correlation between average temperature and the presence of a pool. This induces, of course, a correlation between Pool and Size × Temp . This correlation between the included variables and the omitted variable Pool introduced bias into the previous coefficient estimates, since Pool has a direct and important effect on electric bills. Including pool removed this source of contamination and changed the results. Second, is entirely possible that any or all of these variables are still correlated in unknown ways with the remaining random error component which is omitted from the regression.

Flexibility of Functional Form – Log and Other Transformations We have seen two limited examples of non‐linearity in our explanatory variables above – interactions and dummy variables. But, even though the model must be linear in the variables to use standard linear regression techniques, the sky is the limit with respect to non‐linearity of the variables.4 If there is reason to believe a relationship is quadratic (potentially U‐shaped or shaped like an inverted U), we can include the square of one of the explanatory variables as a new independent variable. If we think the relationship may be cubic, for example the typical cost function from microeconomics, we can include both the square and the cube of an explanatory variable as additional independent variables. If we think the dependent

4 There are non linear regression techniques that allow non linearity in the parameters, though they

are less commonly used. They are, however, beyond the scope of this class.

106

variable is inversely proportional to one of the explanatory variables, we can include 1/X as an independent variable. All of these transformations of the dependent variables leave the regression model linear in the coefficients to be estimated. All of these can be estimated using OLS. Thus, while the standard technique may be referred to as linear regression, it offers a great deal of flexibility with regards to the shape of the approximation. Of particular interest is the log‐linear or constant elasticity model. Anytime we might expect the percentage response of the dependent variable to be a roughly constant multiple of the percentage change in the independent variable, this may be a good approximation to use. For example, suppose we think a 10% increase in income will cause a 10% increase in purchases of quarter pound burgers whether the average price of a burger is $2 or $4. Such a relationship is not captured with a simple linear model. For example, with regard to the electric bill example, it may be reasonable to think the percentage change in the electric bill is a multiple of the percentage change in house size or other variables so that the absolute increase in the electric bill when house size increases from 1500 to 2000 square feet is larger if the temperature is higher, but is the same in percentage terms. Thus, we could approximate the electric bill with the following power function:

Billi = e β0 + β4 Pooli +ε i Sizeiβ1Tempiβ2 .

(4.25)

We dropped the separate variable equal to the product of temperature and size because, in this form, the two variables are already multiplied by one another, albeit after being raised to potentially different powers. So, this functional form already builds in interactions between the independent variables. Taking the partial derivative of temperature yields the following:

∂Billi = β 2 e β0 + β4 Pooli +ε i Sizeiβ1Tempiβ2 −1 ∂Temp . Billi = β2 Tempi

(4.26)

So, as long as β1 and β 2 are positive, a given increase in temperature has a larger impact on the electric bills of larger residences. We showed in Chapter 2 that taking the natural log of both sides of a constant elasticity approximation (a power function) converts it to an equation that is linear in the logs of the original variables. Taking the logs of both sides, we obtain

ln ( Billi ) = β 0 + β1 ln ( Sizei ) + β 2 ln (Tempi ) + β3 Pool + ε i .

(4.27)

Thus, once we have taken the natural logs of the variables, the model is linear in the logs, and we can use linear regression techniques to estimate the parameter values.

107

The transformed data for our electric bill example is given in the table below. Running the regression model above on this transformed data produces the following result:

ln ( Billi ) = 3.31 + 0.28ln ( Sizei ) + 0.46 ln (Tempi ) + 0.26 Pool .

(4.28)

The interpretation of the coefficients of the first two variables is that a 10% increase in house size increases the electric bill by 2.8%, and a 10% increase in temperature increases the bill by 4.6%. The interpretation of the coefficient on Pool is different. Recall the original power function version of the model, the electric bill is e β0 + β4 Pooli +ε i Sizeiβ1Tempiβ2 . Basically, when a pool is present, the electric bill that would be incurred without the pool is multiplied by e0.26 = 1.2969 . So, the presence of a pool increases electric bills by about 30%. It seems unreasonable to think a pool would have that large an effect. But, remember, the coefficient is measuring not only the effect of a pool, but, also the effect of any omitted appliances and activities that are closely correlated with the presence of a pool (plus, this is hypothetical data anyway). In general, when the dependent variable is in log form, the coefficients on dummy variables are closely related to the percentage change arising from the presence of the condition indicated by the dummy variable. To estimate this percentage effect, we exponentiate the coefficient and subtract 1. (The result will be close to the coefficient, as long as the coefficient is not too far from 0. Those familiar with how natural logs and exponential functions work may see why that is.) In this chapter, we have shown how to use regression analysis to estimate approximations of underlying economic relationships and how to interpret the results. The technique of regression analysis involves using calculus and algebra to find coefficient estimates that minimize the total (squared) difference between the approximation and the data in a given sample. While the approximation must be linear in the parameters, it can be very non‐linear in the variables. However, just because we have estimated an approximation does not mean the approximation is reliable, reasonable, or accurate enough for its intended application. The next chapter takes up evaluation of regression results.

108

Chapter 4 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter.

109

Chapter 5 Evaluating Regression Analyses Up to this point, we’ve discussed the process for constructing an approximation based on a regression. However, how do we evaluate the accuracy of a regression? For example, above we presented the results of five different regressions to model electric bills in our hypothetical dataset. How would we decide which was most accurate? How would we decide if the most accurate one was accurate enough?

Bias and Imprecision The first thing to understand is the difference between imprecision and bias. This idea is easy to illustrate with an example that has nothing to do with economics. Suppose the three stooges are playing darts. Larry is all over the board with no pattern at all, Curly throws a tight group but he is always several inches high and right, while Moe throws a tight group that tends to be a little low and left. Larry is unbiased, because he is not consistently off in any particular way, but he is very imprecise. Curly is precise because he throws tight groups, but he is biased because he is always high and right by a wide margin. Larry is relatively precise and only slightly biased. Imprecision means a lack of consistency or a high degree of random variation between in our results, whether or not they are right “on average”. That is, whether or not we repeated a regression numerous times on different random samples, our estimates would vary a great deal from one regression to another. Bias refers to a non‐random difference between the results of a regression and the true underlying model. As touched on in Chapter 4, potential correlation between independent variables in the regression and variables that have been omitted and thus are part of the error component introduces omitted variables bias. Because there are always unknown unknowns, this is uncertainty about the true underlying relationships on a very deep level. While there are techniques to limit the impact of such bias, a large measure of judgment is required. How can we speak with complete objectivity about the impact of the unknown unknowns on our regression models? Before tackling the topic of omitted variables bias, it is sensible to increase the quality of our judgment about economic models. So, an in depth discussion of bias in regression models is postponed until much later in the book. For now, we will assume that all inaccuracy within our models is due to imprecision and focus on how to quantify, evaluate, and improve precision.

Evaluating the Model Specification and Data Before we discuss quantitative measures of accuracy, we consider some general principles that should be followed to increase the reliability of our results. When they are not followed, they give us reason to suspect the validity of the findings.

110

It’s important to include all variables that our theory indicates are major determinants in the approximation. For example, if we wanted to predict how many ice cream cones we plan on selling at a beachfront shop on a given day, it would be important to include a variable that controls for the weather, since rain probably reduces demand. Similarly, it is important NOT to include variables for which there is no theoretical rationale. It may seem natural to include every variable for which data is available, just in case it has some correlation with the dependent variable. This is referred to as data mining, or overfitting if we decide which variables to include based only on their contribution to the measured “fit” of the approximation to the data, and should be avoided to maintain the integrity of the approximation. It is almost certain that a large number of variables are correlated with past observations of the dependent variable through sheer random chance. If you search long enough, you will find some for sure. But, there is no reason at all to expect such correlations to be due to any actual underlying relationships or for them to hold up in the future – they may simply be spurious. So, no inferences about underlying causation or future outcomes should be drawn from a regression that was arrived at by throwing in every possible variable and keeping the ones that “worked” in that they were correlated. Further, every irrelevant variable included in a regression in essence uses up some of the data, reducing the ability to test the importance of the other, more relevant, variables with limited data. In short, economic theory should be the backbone of our choice of variables. The actual sample data must be as appropriate to our purpose, reliable, and large as is feasible. Often, there are trade‐offs between the quality of a dataset and your sample size. For example, a city level analysis may allow you to examine the impact of variations in price, income, and demographic characteristics on demand more easily than state level data, but state level data on many variables such as income and demographic characteristics is often more accurate and available much more frequently than city level data. More data will always improve the accuracy of our regression – as long as it is good data. But, more data that is not suited to our purpose is useless – worse than a small dataset containing accurate observations on the right variables. When designing a regression, we need to decide what form it will take, not only what variables to include. Should the model be linear, quadratic, log‐linear, or take some other form? Is there good reason to assume interaction between two or more of the independent variables? Often, economics suggests important normalizations to make before running a model. For example, demand should often be expressed on a per capita basis and dollar amounts should be adjusted for inflation. These decisions should be based on economic theory insofar as possible, though theory offers less guidance about the detailed shape of an approximation than the variables that should be included.

111

Evaluating the Signs and Magnitudes of the Coefficient Estimates Once we have the regression results, the first thing to do is to check the signs of the coefficient estimates against our expectations based on economic theory. For example, the law of demand states that increasing prices yield decreasing quantities; thus, we should find negative coefficients on product price when approximating quantity demanded. Or, if one of our variables was income and we were dealing with a normal good (one whose demand increases as income increases), we would expect the coefficient on income to be positive. If the signs of the coefficients depart from well established theory it is a reason to suspect there is something wrong with our model. If there is something wrong with our model, there is no reason to expect it to be stable from one time to the next and therefore no reason to think it has any predictive power. We may have strong expectations about the signs of some coefficients for reasons specific to an individual case. If these expectations are not borne out, we should think carefully about what that means. Is it most likely that our expectations were wrong or that there is some flaw in our data or model? Is it likely that omitted variables are causing the confusion? We should also consider the reasonableness of the magnitude of the coefficients, though there is no hard and fast rule for what “reasonable” means. Basically, the coefficient estimates should meet the straight face test. For example, if you find a 1% increase in the price of gasoline decreases market demand by 100%, or that a 100% increase in income increased demand for travel by only 1%, something is wrong. Consulting other studies of similar markets is a good way to get an idea if your coefficient estimates are in the ballpark of reasonable.

Evaluating the Statistical Significance of the Results After checking that the signs of the estimated coefficients are consistent with established economic theory and the results of other empirical studies, that their magnitudes are reasonable, and making sure that there is a reasonable explanation for any anomalies, it is time to move on to quantitative measures of the reliability and precision of the results. For this, we need to revisit what the regression is doing mathematically. As seen in equations (4.12) and (4.14), the coefficient estimates are chosen to minimize the sum of squared errors, SSE. Another way of looking at it is that we are trying to choose the coefficients to account for as much of the variation in the dependent variable around its mean value as possible. Letting Y represent the mean value of the dependent variable, the total sum of squared variation in the dependent variable, SST, is:

SST = ∑ i (Yi − Y ) . 2

(5.1)

Similarly, the sum of squared variation attributable to the model, SSM (sometimes known as the regression sum of squares or the explained sum of squares), is:

(

SSM = ∑ i Yˆi − Y

112

)

(5.2)

Making use of equations (4.14) and a good bit of algebra, it is possible to show that the sum of squares total is equal to the sum of squares attributable to the model plus the sum of squared errors: SST = SSM + SSE .

(5.3)

So, minimizing SSE maximizes SSM. These definitions and this last equality are useful in evaluating the model. Analysis based on these types of calculations pertaining to the variation in the data is known as Analysis of Variance, or ANOVA. In evaluation of the model, a lower SSE is better, all else equal. But, it is always good to have more independent data to use to estimate the model, and the more observations, the higher SSE. What we need is some measure of the typical squared error. Rather than dividing by the number of observations, n, we divide by n‐K‐1, which corresponds to the degrees of freedom of SSE. Recall that when fitting a line, or any two‐parameter approximation, to two data points, we accounted for the observed data completely. Similarly, since we are estimating K+1 coefficients, we can fit K+1 points perfectly. We therefore only have n‐K‐1 independent contributions to SSE. The smaller SSE/(n‐K‐1), the more power the model has to account for the dependent variable. For later reference, this ratio is known as the mean square error, or MSE:

MSE =

SSE . n − K −1

(5.4)

Similarly, the more variation picked up by the model per explanatory variable, that is, the higher SSM/K, the more explanatory power the variables would seem to have.5 This is the mean square due to the model, or MSM:

MSM =

SSM . K

(5.5)

The ratio of these two measures is an F‐statistic:

MSM SSM K . = MSE SSE ( n − K − 1)

(5.6)

The F‐statistic is so named because it follows a statistical distribution known as the F‐distribution. We will not be proving that the expression in equation (5.6) follows the F‐distribution. But, it would be good to know if there is reason to believe the K independent variables we selected tell us more as a group about the dependent variable than would K randomly selected variables. The fact that the properties of the F‐distribution are known means that we can use the F‐statistic to shed light on just that question.

5 K is the degrees of freedom of the sum of squares accounted for by the model, even though K+1

coefficients are estimated. That is because SSM =

∑ (Yˆ − Y ) i

, and, while K+1 coefficients are

used to calculate the predicted values, one is also used to calculate the mean, so, the independent variation around the mean can move in K dimensions, not K+1.

113

Regression output provides the F‐statistic and a p‐value associated with it. Roughly speaking, that p‐value is the probability that K randomly selected variables with no systematic relationship with the dependent variable would be as highly correlated with the dependent variable as are the K variables of our model. Put differently, it can be thought of as an estimate of the chances that all the correlation in the model is completely spurious. Thus, the lower the p‐value, the likelier it is that our explanatory variables are actually correlated systematically with the dependent variable. If the p‐value is low enough, the model as a whole is said to be statistically significant – that is, there is statistical evidence that there really are underlying relationships in the data. What “low‐enough” means is up to the judgment of the researcher and the user, though 10%, 5%, and 1% are often points of focus in discussions. For those familiar with hypothesis testing, it is a p‐value for a test of the null hypothesis that all of the independent variables are unrelated to the dependent variable. Once we have established that the model as a whole is statistically significant, the statistical significance of the individual coefficient estimates should be evaluated. Just as the sign of the coefficient on price should be negative, since there is a strong theoretical reason to think price is an important determinant of the quantity demanded, there should be statistical evidence that price increases reduce demand. Regression output provides estimates of the coefficient, the standard error of the coefficient, a t‐statistic for the coefficient, and a p‐value associated with that t‐ statistic. While deriving the formula for the standard error of the coefficients is beyond our scope, it is important to understand what it means. The standard error of the estimated coefficient on independent variable k is denoted σ βk . Specifically, it is the square root of the variance of the estimated coefficient. The variance of the estimated coefficient, denoted σ β2k , is the expected value of the square of the difference between the estimated coefficient and the true value:

((

σ β2 = E βˆk − β k k

) ) . 2

(5.7)

Imagine collecting many independent samples and running the same regression repeatedly. Due to the random error component, you will not get the same coefficient estimates each time. The variance of the coefficient is an estimate of the typical squared difference between the estimated coefficients and the true underlying coefficient. The standard error of the coefficient is the square root of its variance, so it is an estimate of the typical difference (in absolute terms) to be expected between a particular estimate and the true value. The t‐statistic is just the ratio of the coefficient to its standard error:

tβk =

βk . σβ k

114

(5.8)

Intuitively, if the coefficient is small relative to the typical error in estimating it, there is no real reason to think it is anything other than zero. Suppose we estimate a log‐linear demand approximation and find an estimated price elasticity of demand equal to ‐0.25 but the standard error is 0.5. The typical error is bigger than the coefficient, so the “true” coefficient could easily be 0, and our finding of ‐0.25 could be random variation. Thus, there would seem to be no statistical evidence that the demand curve slopes down. That should make us very suspicious of our model specification, the data we used, or both. The statistical distribution of the t‐statistic is similar, but not identical, to the normal distribution in that it has a similar shape and becomes closer to the normal distribution as the sample size gets bigger. Roughly speaking, as long as there are a reasonably large number of observations relative to the number of parameters being estimated, say at least 30, the chances of observing a t‐statistic of 2 or over are 5% or less. So, a good rule of thumb is that a t‐statistic near 2 or larger in absolute value indicates statistical significance. For the example used in the previous paragraph, where the coefficient was ‐0.25 and the standard error was 0.5, the t‐ statistic is ‐0.5. Alternatively, we could use the standard error of the coefficient and knowledge of the t‐distribution to construct a confidence interval for the true value of the coefficient. Using the rule of thumb described above, an approximate 95% confidence interval for the coefficient is the estimated coefficient plus or minus twice the standard deviation:

βˆk − 2σ βˆ ≤ β k ≤ βˆk + 2σ βˆ . k

(5.9)

The interpretation is if this regression were run repeatedly on independent samples, an interval constructed like this would contain the true value about 95% of the time. We can test the hypothesis that the true value of the coefficient is zero by checking to see whether the confidence interval contains zero. If not, there is statistical evidence that there is some underlying relationship. For the example above, the 95% confidence interval for demand elasticity would be constructed as follows:

−0.25 − 2 ⋅ 0.5 ≤ η ≤ −0.25 + 2 ⋅ 0.5 −1.25 ≤ η ≤ 0.75

Speaking approximately, in the example we are 95% sure the true elasticity of demand falls between ‐1.25 and 0.75. Thus, the model does not provide solid evidence that elasticity of demand is even negative! To formalize the idea of statistical significance precisely, rather than by rule of thumb, we need to know precisely the statistical distribution of the t‐statistic. It is similar, but not identical, to the normal distribution, in that it has a similar shape and becomes closer to the normal distribution as the sample size gets bigger. Importantly, the reported p‐value of the t‐statistic tells us the chances of observing a t‐statistic of a given magnitude (in absolute value) if the true value of the coefficient

115

is zero. Thus, if the p‐value is “low enough”, the coefficient is said to be statistically significant. The null hypothesis that the true value of the coefficient is zero may be rejected at a level of significance given by the p‐value. This test is only strictly valid under the assumptions of the linear regression model, the most important of which is that there is no omitted variables bias. Again, what “low enough” means is up to the ones using the results. The qualitative and quantitative criteria discussed above can tell us if the results of the model are reasonable based on economic theory and if they appear to represent real relationships base on statistical evaluation. Those hurdles must be cleared by any good model. But, they do not tell us whether the model is “good enough”. How to make that determination depends on the purpose of the regression. There are two possibilities: to generate a model that can be used to predict the values of the dependent variable, or to generate coefficient estimates to use for decision making purposes.

Evaluating the Accuracy of Results – Dependent Variable Prediction If the purpose is to predict the dependent variable, we need to know how the model fits the data. One measure of that is the R‐Squared, or R2, defined as:

R2 =

SSM . SST

(5.10)

Thus, the R2 is the fraction of the total variation of the dependent variable around its mean that is explained by the model. While the R2 is often used as a measure of “goodness of fit”, it is of limited usefulness. It does not tell you how big the errors will be and how much that matters. The root mean square error, also known as the standard error of the regression or the standard error of the estimate, is much more informative. It is equal to the square root of the mean square error and is denoted RMSE or σˆ :

RMSE = MSE =

∑ (Y − Yˆ ) i

n − K −1

(5.11)

Remember, MSE is found by dividing the sum of the errors by the degrees of freedom to accurately represent the average squared error per data point that the regression is actually predicting. In short, this number represents the explanatory power of the regression. Since we take the square root, we get an estimate of the typical amount of error in the predicted values based on our model. The RMSE can then be compared to the predicted values to get an idea whether or not the margin of error is acceptable. There are two ways we might do this. First, for approximately normally distributed data, about 95% of the distribution lies within two standard deviations of the mean value (we used this rule of thumb above). That means an approximate 95% confidence interval for the dependent variable given a predicted value is:

116

Yˆi − 2σˆ ≤ Yi ≤ Yˆi + 2σˆ

(5.12)

This interval will give us a good idea of the variability of the actual values from a predicted value. Suppose Yˆ = 10, 000 and the standard error is σˆ = 10 . A 95% confidence interval for the actual value is [9980,10020] . If we need to order inventory to meet demand, this might be a very good estimate. We could order 10,050 units and could meet demand with a very small amount of unsold product relative to the initial order, even if demand turned out on the low end of the interval. While confidence intervals give us a starting point for determining the accuracy of the regression, their width does not tell us everything. What if our point estimate was Yˆ = 50 but our standard error was still 10? The new confidence interval is [30, 70] , which is clearly a much less accurate interval. If this is a prediction of demand and we order, say, 75 units to cover the upper end of the interval, there is a very good chance that over 40% of our order will remain unsold. Another, perhaps more useful, way to measure the accuracy of the predictions of a regression is to look at the size of the standard error relative to our point estimate, or the relative standard error, RSE:

RSE =

σˆ Yˆ

(5.13)

If the standard error is 10 and the predicted value is 10,000, the RSE is (10 10000 ) = 0.001 . If instead the predicted value is 50, the RSE is (10 50 ) = 0.2 . So, it is not the standard error in isolation that matters, but its magnitude relative to the predicted value of interest to the user. There is no concrete rule for what value is acceptable; it all depends on the situation at hand. If inventory costs are small, for example, maybe a large RSE won’t be a problem. If storage is expensive, the RSE will need to be smaller before the regression is “good enough”. Special problems arise when evaluating the accuracy of models in which the dependent variable has been transformed, such as in log‐linear models. That is, because the root mean square error of the log variable is not the same as the root mean square error of the untransformed variable, which is ultimately of interest. These problems are exacerbated even more when we are comparing models in which the variable has been transformed to models in which it has not been transformed. This discussion gets technical – sorry, I could not come up with any way around it! To make the comparison, we have to transform the predicted transformed values back into the untransformed state. For example, if we had a prediction of the natural log of Y, ln Y , we need to transform that to a prediction of Y. How would we do that? It seems the answer would be to exponentiate ln Y to get Yˆ = eln Y . That is, if the predicted log of Y is 3.168, it seems the predicted value of Y would be Yˆ = e3.168 . That is not quite true, darn! Why not? The log linear model looks something like

117

ln (Y ) = β 0 + ∑ k β k ln ( X k ) + ε ,

(5.14)

where the error term is approximately normally distributed and has an expected value of zero. But, the constant elasticity model from which it is derived looks like:

Y = e β0 X 1β1 X 2β2

X Kβ K eε .

(5.15)

Once we have estimated coefficients, we can get predicted values, so we have

ln (Y ) = βˆ0 + ∑ k βˆk ln ( X k ) .

(5.16)

Exponentiating these indeed gives

eln Y = e β0 X 1β1 X 2β2 ˆ

X Kβ K . ˆ

(5.17)

The trouble is that while the mean value of ε is 0, the mean value of eε is not quite 1, even though e0 = 1 ! That is because if ε is normally distributed with a mean of 0, the distribution of eε is skewed to the right, meaning it has a median value of 1 and a mean that is a bit higher. So, if you want to predict the median value of Y, exponentiate the predicted log of Y. That is, if you used equation (5.17) to predict values for Y repeatedly, it would give you the median of the distribution of actual outcomes. If you want to predict the mean, or expected value of Y, a better estimate is:

2 ˆ ˆ ˆ Yˆ = eln Y +σˆ 2 = e β0 X 1β1 X 2β2

X Kβ K eσˆ ˆ

(5.18)

where σˆ is the RMSE from the log regression. To evaluate the predictions of a log‐ linear model, you must estimate the predicted values of the dependent variable using equation (5.18) and then use those to estimate the RMSE of the predicted values of Y. Many readers will not have absorbed anywhere near all of the import of the last four paragraphs on a first reading. Before going back and reading it again, read on to the end of the chapter, where we will go through an example that will make this more concrete.

Evaluating the Accuracy of Results – Coefficient Estimates Similar considerations occur when the objects of interest are the coefficient estimates themselves. Above, we discussed constructing confidence intervals for the coefficients. Recall that, speaking approximately, 95% of the time:

βˆ − 2σˆ β < β < βˆ + 2σˆ β . Looking at the interval can give us a good idea of the general variability of our estimate. But, we should further use this understanding of the margin of error in our estimates to get some idea of what a typical error might cost us. For example, suppose we want an estimate of the price elasticity of demand to set the profit‐maximizing price. Suppose the estimate were ‐6 with a standard error of 2. Then, a 95% confidence interval for the elasticity of demand would be

118

−10 < β < −2 Assuming we were going to use this to determine the profit‐maximizing price, we can use equation 2.18 to construct a point estimate and a confidence interval for the

. Plugging in our estimates for elasticity, we find the interval for 1+η the markup factor to be

markup factor,

⎡ −10 −2 ⎤ , ⎢ ⎥ ⎣ (1 − 10 ) (1 − 2 )⎦ or

[1.11,2 ] and the point estimate to be

−6 = 1.2 . 1− 6 This suggests that our profit‐maximizing markup factor over marginal cost should be somewhere between 11% and 100%, with a point estimate of 20%. Or, if marginal cost is $100, it suggests a price between $111.11 and $200, with a point estimate of $120. Clearly, this is a very broad range for the optimal price. Setting a price of $120 might cost a lot of profit if the optimal price were $111 or $200. So, it probably makes sense to go back and try to improve the model.

Improving Precision After looking at the regression results quantitatively, we may want to improve the soundness and accuracy of the model. It is possible to improve the model by collecting more data. More data can mean more variables or more observations. It may seem natural that more data is always better, but this is not the case. In fact, variables that have little or no systematic relationship with the dependent variable can actually distort any “good” data that may be in the regression. Similarly, adding more observations can be a bad idea if it means going to lower quality data – that is, less precisely measured data. So, the solution for increasing accuracy is to make sure you have more data of high quality on variables that ought to be included on a basis of economic theory. The problem is that this can become expensive. Firms will face a trade‐off between the quantity and quality of the data available for analysis and the cost of that data. The more important it is to have an accurate approximation of demand, cost, or whatever is being studied, the more it will make sense to spend to obtain more and better observations and observations of more of the important variables. On a similar note, discarding data that shouldn’t be there will also increase the reliability of the results. The computer is ultimately responsible only for the relatively simple calculations described earlier; it has no way of judging the substance of the economic relationships we are trying to measure.

119

Aside from adding additional observations, and from adding appropriate variables or dropping inappropriate ones (based on theory and solid reasoning, not data mining), reflection may allow us to find ways to improve the specification. We may find that interactions between price and income are important; that senior citizens are more price‐sensitive so an interaction between age and price is important. Careful thought can often be used to improve models that are not performing well. Though, the performance of the model is ultimately limited by both the care of the researcher and the quality of data available.

Evaluating Results – an Extended Example To make all of this more concrete, lets tie it to some regression output. The output below is from Microsoft Excel and corresponds to the model of electric bills expressed in equation (4.24). The independent variables are Size, Temp, SizeXTemp, and Pool. The R Square of 0.949 means the model accounts for 95% of the variation of the electric bill around its mean value which was observed in the data. The output under the heading ANOVA corresponds to various sum of squares calculations. The line labeled “Regression” gives sum of squares data for the model. So, SSM is 124,040. Since SST is 130,770, we see where the R square came from:

R2 =

SSM 124040 = = 0.95 . SST 130770

(5.19)

The standard error of the estimate, or RMSE, is given in the first set of numbers at the top as 21.2. Under the ANOVA section, the column labeled MS corresponds to mean square calculations where the total sum of squares is divided by the appropriate degrees of freedom. MSE is 449 and the square root of that gives:

RMSE = MSE = 449 = 21.2 .

(5.20)

Suppose we are interested in the accuracy of a predicted electric bill for a house with 2,500 square feet and a pool when the temperature is 85 degrees. The predicted value would be calculated as follows:

Bill = 59.64 + 20.12 ⋅ 2.5 + 1.62 ⋅ 85 + 0.15 ⋅ 2.5 ⋅ 85 + 74.93 = 354 .

(5.21)

Given the RMSE, the relative standard error is:

RSE =

21.2 = 0.06. 354

(5.22)

That means the model will typically miss the actual bill on a house with these characteristics by about 6%. Alternatively, a rough 95% confidence interval is:

354 − 2 ⋅ 21.2 ≤ Bill ≤ 354 + 2 ⋅ 21.2 312 ≤ Bill ≤ 396

(5.23)

From the row of the ANOVA table labeled regression, the MSM is 31010. Dividing by the MSE gives the F‐statistic:

120

MSM 31010 = = 69.1 . MSE 449

(5.24)

The P‐value for that F‐statistic is given as 0.000000002. Thus, the null hypothesis that the true coefficients are zero for all independent variables is almost certainly not true. Put differently, the chances of getting this high of a completely spurious correlation are negligible. That is a good indicator about the quality of the model. Regression Statistics Multiple R 0.97 R Square 0.95 Adjusted R Square 0.93 Standard Error 21.2 Observations 20 ANOVA

F Pvalue 69.1 2E‐09 Lower Upper Coef. Std Err t Stat Pvalue 95% 95% Intercept 59.64 74.61 0.80 0.44 ‐99.39 218.66 Size 20.12 29.28 0.69 0.50 ‐42.29 82.54 Temp 1.62 0.96 1.69 0.11 ‐0.43 3.68 Size X Temp 0.15 0.35 0.41 0.68 ‐0.61 0.90 Pool 74.93 16.21 4.62 0.00 40.38 109.48 Looking at the individual coefficients, we can see that they all have the expected signs. Further, a bit of calculation would reveal that the magnitudes of the coefficients are not unreasonable. The standard error for the coefficient on Pool is small relative to the coefficient, so the t‐statistic is large, the P‐value is small, and zero does not lie within the bounds of the confidence interval provided. The standard error of the coefficient on temperature is larger relative to the coefficient, but the t‐stat is somewhat close to 2, the P‐value is relatively small, and the confidence interval barely contains zero. So, statistically, it seems there is evidence of a relationship between temperature and the electric bill, though, the evidence is perhaps somewhat weaker than we might have hoped. Regression Residual Total

SS MS 4 124040 31010 15 6730 449 19 130770

The standard errors are large relative to the coefficients for both size and for the interaction of size and temperature. That means the t‐statistics are small and the corresponding P‐values are large. Looking at the upper and lower limits of a 95% confidence interval provided in the output, we see that zero is well within the confidence interval for these two variables. In other words, statistically, there is no particularly strong reason to think that either size or its interaction with temperature is statistically significant individually. A more advanced test would be to see if there is evidence that the two matter when taken together – but that is

121

beyond our scope. For now, we note that while we may have strong theoretical and logical reasons to suspect that both those variables actually matter, those parameters are not measured very precisely. If we need the actual parameter estimates for Size or its interaction with Temp, we probably need to try to collect more data. This would be especially true if we have any guesses about variables that are missing from our dataset that might be confounding our attempts to measure the effect of size precisely. It would also be true if we could collect data in which Size and SizeXTemp exhibit greater variation, since we can trace out the effects of independent variables more accurately the more variation we observe in them in the sample data. Let’s use the information above to compare and ultimately choose between various models of electricity bills. The table below collects the coefficients, their standard errors, the R2, the RMSE, and the F‐statistic and its P‐value for all six regressions in one place for comparison. The first four are the primarily linear models estimated earlier in the chapter. The fifth is an additional linear model that includes the pool indicator variable but does not include the interaction term. The sixth is the log‐linear model from earlier in the chapter. In the table, asterisks are used to indicate the general range of the P‐value associated with a coefficient. One asterisk means the P‐value was less than or equal to 0.1, two means less than or equal to 0.05, and three means less than or equal to 0.01. So, they indicate approximate levels of statistical significance. All of our regressions are highly significant taken as a whole, as indicated by the P‐values of the F‐statistics. In the linear regressions, the RMSE takes a large jump down when temperature is added moving from model 1 to model 2. When the interaction term between size and temperature is added in model 3, the RMSE actually goes up and the significance of the regression as a whole goes down. Looking at the coefficients, while both of those variables were very significant statistically in model 2, with small standard errors, their standard errors go up a lot when the interaction is included in model 3. Moreover, the sign of the interaction term is contrary to our expectations. It is hard to say whether model 2 or 3 is “better”. Within the sample, model 2 gives better predictions (slightly, as measured by RMSE). But, there is a strong reason to suspect the interaction matters. In trying to sort out what went wrong in model 3, there is a ready suspect ‐ we may simply not have enough independent variation in size and temperature. In that case, the product of the two will be very highly correlated with the individual variables, making it impossible to sort out what part of the variation in electric bills is driven by size, temperature, or their interaction, since they all tend to move together. This is true in spite of the fact that we know from the high significance of the regression as a whole that the group of variables as a whole is strongly associated with variation in the electric bill.

122

Summary of Regression Output for Electric Bill Example Standard errors in parentheses. * P‐value≤0.1. * P‐value≤0.05. * P‐value≤0.01. Model # (1) (2) (3) (4) (5) Linear or Log‐ Linear Size Temp SizeXTemp Pool Intercept F‐statistic P‐value R Square RMSE

Linear

75.11*** (15.61) ‐‐ ‐‐ ‐‐ ‐‐ ‐‐ ‐‐ 111.27** (40.56) 23.16 1.4E‐04 0.563 56.37

64.07*** (8.79) 2.48*** (0.38) ‐‐ ‐‐ ‐‐ ‐‐ ‐43.66 (32.77) 58.94 2.3E‐08 0.874 31.14

79.57* (39.67) 3.01** (1.38) ‐0.21 (0.52) ‐‐ ‐‐ ‐82.51 (102.49) 37.41 1.8E‐07 0.875 31.93

20.12 (29.28) 1.62 (0.96) 0.15 (0.35) 74.93*** (16.21) 59.64 (74.61) 69.12 1.8E‐09 0.949 21.18

(6) Log‐ Linear Linear 31.65*** 0.28*** (8.95) (0.08) 2.01*** 0.46*** (0.27) (0.08) ‐‐ ‐‐ ‐‐ ‐‐ 73.47*** 0.26*** (15.41) (0.06) 30.88 3.31*** (26.75) (0.33) 97.13 77.08 1.8E‐10 1.0E‐09 0.948 0.935 20.63 0.08

Analysis of the raw data shows the linear correlation coefficient (which ranges from ‐1 for a perfect negative relationship to +1 for a perfect positive relationship) is 0.19 between size and temperature, 0.82 between size and the interaction of size and temperature, and 0.70 between temperature and the interaction variable. It is not surprising that we were unable to sort out the independent effects of the three variables precisely. The solution is to collect more data. We need data with more variation in size, more variation in temperature, and, in particular, we need more variation in temperature for each given size and more variation in size for each given temperature. If it is not possible to attain data that exhibits such variation, it will not be possible to identify the separate effect of the interaction term. While the negative sign on the interaction in model 3 may be due simply to the inability to sort out the effects of the three independent variables from one another, there is another possible culprit. As alluded to previously, it is possible that bigger houses are newer and therefore better insulated. The electric bill will rise less with temperature increases in a well insulated home than in a poorly insulated one. The interaction of age, and therefore insulation, with temperature may be getting mixed up with the interaction of size and temperature. To get any idea about that possibility, we need to get data on age. Separate data on the degree of insulation would be good to have, too.

123

When Pool is added in model 4, the RMSE takes another large step down, and correspondingly, the R2 jumps up. As discussed earlier in the chapter, the sign on Pool makes sense, even if the magnitude may seem a little large to be due to the pool in and of itself, and the coefficient is statistically very significant. In model 4, the sign of the interaction term becomes positive but it remains small, imprecisely estimated, and statistically insignificant. Because Pool clearly seems to work, but it is not clear if the interaction should be included, Model 5 represents a regression with Size, Temp, and Pool, but without the interaction term. The RMSE is again slightly lower without the interaction. With Pool included, and, without the interaction, the coefficients on Size and Temp are again highly significant, as in Model 2. Of the first five models in the table, which is best? Since there are solid logical and theoretical reasons to think Size, Temp, and Pool all matter, and since the evidence is strongly consistent with that hypothesis, it is between model 4 and model 5. If we were to adopt the criteria that a variable should only be added when it reduced the prediction error, that is the RMSE, we would go with model 4. However, there is a good reason to think that the interaction does matter. The best approach is to try to collect more and better data. If we have to use one of these models, which is better is a toss up. Since we have only 20 observations, a good argument may be made for dropping the interaction and going with model 5 since we just do not have enough data to do a good job of estimating 5 coefficients. If we had 100 observations and similar results, that argument would not apply. The last column of the table contains the results of the log‐linear regression. The coefficients all have the right signs and are very statistically significant. Remember, the coefficients on Size and Temp represent elasticities. Judging from the P‐value of the F‐statistic, the model as a whole is very statistically significant. While the model explains 93.5 percent of the variation in the log of the electric bill with a RMSE of only 0.08, those numbers can not be compared with the results from the other regressions, since the dependent variable has been transformed. Instead, we have to use equation (5.18) to convert the predicted log of the bill to a predicted value for the mean realization of the bill for each observation. We then calculate the squared residual for each observation, add them up, and divide by n‐K‐1 to get the mean square error. Taking the square root gives the RMSE as 21.87. This calculation is performed in the table below. Thus, the log‐linear model produces a slightly larger prediction error than do models 4 and 5, but only slightly so. In addition, model 6 does build in some interaction between size and temperature, as shown in equation (4.26).

124

Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Predicted Predicted Bill (Mean) Squared 2 eln Y +σˆ 2 Residual Residual Bill Log Bill Log Bill 343.32 5.84 5.8051 333.11 11.31 127.95 299.21 5.70 5.7251 307.49 ‐7.26 52.67 302.02 5.71 5.5578 260.11 42.78 1830.11 167.94 5.12 5.2458 190.40 ‐21.83 476.69 209.55 5.34 5.3433 209.90 0.35 0.12 367.80 5.91 5.9851 398.80 ‐29.67 880.35 390.06 5.97 5.9193 373.39 17.91 320.80 398.46 5.99 6.0037 406.28 ‐6.47 41.86 224.87 5.42 5.3570 212.81 12.77 163.04 313.27 5.75 5.6896 296.77 17.49 305.74 209.36 5.34 5.2789 196.80 13.21 174.63 355.15 5.87 5.9118 370.60 ‐14.22 202.11 344.13 5.84 5.7951 329.77 15.46 238.96 453.55 6.12 6.0543 427.35 27.63 763.24 184.73 5.22 5.3297 207.06 ‐21.64 468.44 372.66 5.92 5.9751 394.84 ‐20.86 435.23 264.10 5.58 5.5938 269.66 ‐4.66 21.74 325.37 5.78 5.8203 338.21 ‐11.72 137.26 204.54 5.32 5.2284 187.11 18.05 325.94 205.89 5.33 5.4473 232.91 ‐26.24 688.40 SSE 7655.27 MSE 478.45 RMSE 21.87

Models 4, 5, and 6 all have similar explanatory power, within the sample, with a very slight edge to model 5. But, models 4 and 6 allow for interaction between size and temperature. In short, each of these models seems about as good as the others, based on the available data. Additional data that showed more variability in the independent variables and included other relevant information such as the age and insulation quality of the homes and family size and income might produce a better model and a clearer choice of which model is best. If this is all that the data that is cost effective to gather and we are going to use the model to make predictions of electric bills within the range of the data, which we choose will not matter all that much because the RMSE is similar for all three models. Since the RMSE is relatively small and the R2 correspondingly high, any of these models may be accurate enough for our purely predictive purposes, depending on how much accuracy is needed for the particular application, it may not be worth the cost of collecting more data. Suppose, however, a property management company wants a good estimate of the effect of temperature on the electric bills of different size residences in order to help them decide on a plan for reducing energy expenses for the properties they

125

manage. Then we have a problem. Model 5 does not allow for interaction, but we don’t get precise estimates from model 4. Perhaps it would make sense to use model 6. But, the fact that the effects in model 4 can’t be precisely identified should cause some hesitation about relying on the results of model 6 for this purpose, too. If we need precise estimates of the effects of size or temperature on the bill, it is absolutely necessary to collect more data, with more variables and more variability in the dependent variables. That question just is not adequately answered yet. This hypothetical example was generated from a known “true” model. So, unlike in actual applications, we can compare each of the models above to the “truth”. The underlying model is:

Billi = 100 + 6Sizei + 0.9Tempi + 0.4Sizei × Tempi + 80 Pooli + ε i

(5.25)

The independent variables for the example, including the random disturbance, were generated randomly. The random disturbance was generated to have a correlation of −0.28 with size. This is to represent an inverse relationship between age and size, where age effects the bill but has been omitted from the model and left as part of the error component. The estimated effect of pool is close to correct in models 4 and 5. However, the other coefficient estimates are far off in model 4, even though it is the closest. The true marginal effect of temperature on the electric bill is:

∂Bill = 0.9 + 0.4Size . ∂Temp

(5.26)

However, based on model 4, the estimate would be:

∂Bill = 1.62 + 0.15Size . ∂Temp

(5.27)

The effect of temperature is much more dependent on the size of the residence than our empirical models found. The “true” and “estimated” effect of temperature on houses of different sizes is shown in the figure below. Below sizes of 2880 square feet, model 4 underestimates the effect of temperature on the electric bill. Above 2880, model 4 underestimates the effect. The figure makes it clear that residence size actually has a much larger impact on the effect of temperature on electric bills than model 4 would indicate.

126

Marginal Effect of Temperature

2.5

2.25

2 True Estimted 1.75

1.5

1.25 1

1.5

2.5

3.5

Size (Thousands of Square Feet)

What should we take away from this example? First, it is critical to have lots of independent variation in the independent variables or it may be impossible to identify the effects of the independent variables accurately. Second, important factors, such as the age of the building, should not be left out of the model. Doing so may make it impossible to identify the effects of the individual variables. Third, you are never sure you have not left something important out. Thus, it is likely that any empirical model suffers from omitted variables bias to one degree or another. Fourth, it is harder to be sure your coefficient estimates are right than to get good predictions of the dependent variable, largely due to the first three things in this list. Fifth, even when your specification is “right”, you may not be able to tell that it is any better than other similar models. Remember, the RMSE was lowest for model 5, not model 4, and the coefficients in model 4 were not statistically significant. Even though model 4 was “right”, there was no way to be sure of that from the output. Finally, even though the statistical procedures associated with estimating models can look very sophisticated and technical, a great deal of judgment, art, and humility should go into formulating, interpreting, and using any empirical approximation.

Limits of approximations While regression analysis allows us to estimate approximations for both predicting dependent variables and estimating coefficients, and to test hypotheses, it has limitations even when the models are well estimated, precisely and carefully evaluated and interpreted, and cautiously applied. In closing, we review the ones we have already mentioned here, and discuss some additional ones as well. First, remember not to extrapolate beyond the range of the data. Determining the range of data can be more difficult than simply checking whether all the independent variables for the case we are predicting falls within the ranges of the corresponding independent variables in the sample. For example, suppose we have data containing observations where income is $30,000 to $40,000 and price is $2 to $4, and other observations where income is $60,000 to $70,000 and price is $4 to $6. We use this to estimate a demand curve. Suppose we want to use these results to predict demand when income is $35,000 and price is $5.50. Since the range of

127

income is $30,000 to $70,000 and price ranges from $4 to $6, it may seem as if the point we wish to predict is well within the observed range. However, we never observe prices over $4 in places with incomes below $60,000. So, there is no support in the data for a prediction when price is $5.50 and income is $35,000. Second, even a model that accounts very well for variation within the sample may not hold up to application to other data, even when the range of data in the sample covers the range for which we are predicting. It is best to verify the model on data that was not included in the sample on which the model was estimated, if enough data is available for that. That will help make certain that the model you ended up choosing did not fit your sample best due simply to spurious correlations in the particular subset of data you used to estimate the model. However, that may not be enough to guarantee the model is applicable to other data points. Past performance is no guarantee of future success! Third, the last point can be pushed further. A regression is estimated given the underlying structure of the market and institutions governing the processes generating the data at a given interval of time. However, the nature of markets and institutions change over time. Such structural breaks can lead to large changes in the relationships between dependent and independent variables. Don’t make the mistake of thinking that the “right” model is necessarily “right” outside of the specific institutional context which gave rise to the data from which is was estimated. Finally, the gold standard for estimating the effect of an independent variable on an outcome variable is a double blind randomized trial. Making the trial double blind and randomized ensures that the independent variable under study is not systematically related to other factors that affect the outcome variable. When we estimate a regression model, the most important assumption we make is that the included independent variables are not confounded with other variables that affect the outcome but are not included in the model. Without explicitly randomizing the assignment of individuals to things like income levels and prices, there is just no way to ensure this assumption is met. It is a huge assumption, violating it has major implications for the validity and reliability of empirical analyses, and we know it is never strictly true. That does not mean we should abandon empirical investigations and applications of economic theory. Imperfect models and approximations are more useful than no models and approximations, provided we are humble and cautious in their application. It also means we should take whatever steps are reasonable to mitigate omitted variables bias. Chapter 6 is devoted to this topic.

128

Chapter 5 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Bias Occurs when an estimate is inaccurate in a consistent, systematic way. Imprecision Occurs when there is non systematic random error in an estimate. Data Mining Also called over‐fitting, it occurs when variables that have nothing to do with the regression are added in order to find a correlation that improves within sample explanatory power. Standard Error An estimate of the amount, on average, that a prediction or coefficient estimate will be in error. The square root of the mean square error. Degrees of Freedom the number of values in the final calculation of a statistic that are free to vary. Root Mean Squared Error (RMSE) The standard error of an unbiased estimator. It is the square root of variance. Coefficient of Determination (R2) The amount of the variation in the dependent variable (y) that is explained by the independent variables (x variables). However, because your variables can be highly correlated but not necessarily cause each other, the RMSE is a much better indicator of the accuracy of the model. Fstatistic – The ratio of Mean Square Deviation fron the Model to the Mean Square Error. tStatistic The coefficient estimate divided by its standard error. Pvalue Probability of estimating an estimated statistic if the true value is 0. This shows the likelihood that any random value would account for explain as much variation of the dependent variable as the given independent variable. The lower the p‐value, the more likely the independent variable is truly correlated with the dependent variable.

129

Chapter 6 Omitted Variables Bias When we first talked about estimating approximations and regression analysis, we introduced the concepts of precision and accuracy. Imagine that three archers are shooting at a target, and the results are as shown in the 1 22 figure to the right. Archer 1’s shots are all over the place, 1 2 2 and they aren’t near the center. Since the way in which 3 3 they are off target does not show any strong tendency, he 3 3 is not biased. But, the shots are very imprecise. This corresponds to a large standard error. Archer 2’s shot placement does not vary a lot, but he is not accurate, since 1 1 he has a strong tendency to be high and right. This corresponds to bias. Archer 3’s shot placement shows as little variance as that of Archer 2, but he is much less biased. The ideal empirical method is the randomized double blind trial. The randomized part comes from selecting a sample, and randomly dividing them into two groups. You administer the treatment to one group, the treatment group, and administer a placebo to the other one, the control group. The double blind part means both those that administer the treatment and those that receive it don’t know which is the control group and which is the treatment group until the end. With group assignment randomized and completely unknown to all participants, there is no way for the outcome to be biased. An economic experiment can in theory be conducted using these same principles. Suppose we are testing the effect of a marketing campaign on a certain product, and we use two randomly assigned Avg Quantity groups of customers, one being the control. Before After Only the treatment group is exposed to the 32 campaign. The results of the study are shown Group Treatment 26 Control 25 21 in the table. The change in quantity for the treatment group was +6 and the change for the control group was ‐4; but this is not what we’re concerned with. What we’re concerned with is the difference between these changes. In our example, the difference in difference (D.I.D.) is 10 (6)‐(‐4). You could calculate some measure of accuracy using ANOVA tables (or any other statistical tools) and if you conducted multiple experiments that returned the same estimate, you could start drawing some conclusions about the effect of the marketing campaign. The problem with empirical research in economics is that it is nearly impossible to conduct randomized double blind experiments on a large scale with stakes large enough to be meaningful. How could we randomly assign customers to markets with different prices, for example? Even if we could, how could we keep both the subjects and the data collectors ignorant of prices in other cities?

130

An alternative to a controlled experiment is to use observational or non‐ experimental data and use regression methods (such as ordinary least‐squared, or OLS) to fit a model, and draw conclusions based on the results. Recall the general form for a linear regression model is K

y = β 0 + β1 x1 + β 2 x2 + ... + β K xK + ε = ∑ β k xk . 0

where ε is the random error, the x’s are the independent variables, and y is the dependent variable. The fundamental assumption when using OLS regressions is that the x’s are uncorrelated with the error term. If this is true, then the regression is not biased. In economics, however, many variables you might want to use as independent variables to explain a dependent variable depend in turn on other variables within the system; that is, most variables are endogenous by nature. Income and price are important demand determinants. But, price depends on the interaction of supply and demand, so, price depends on income and on everything else that determines demand. Similarly, income is determined by the supply and demand for labor of various types, which depends on many of the same things that affect demand. Thus, it is very likely that the different independent variables in a regression will be correlated with omitted variables present in the error term; but since you didn’t measure the error term directly (by definition, or, it would not be in the error term), there’s no way to know. This makes the identification of the effects of the individual variables in your regression very difficult, and since you can’t identify which variables have been omitted, it is difficult to know how biased your regression results are. This can be illustrated using p S2 supply and demand. Suppose at a S1 particular date and time you observe a price of p1 and a quantity S3 of Q1, at a different date and time you observe a price of p2 and a D2 quantity of Q2, and on yet another ˆ D3 D day you observe a price of p3 and a quantity of Q3. You then use these points to estimate a demand curve. D1 Each observation of equilibrium price and quantity represents the Q interaction of a potentially different supply and demand curve. This is shown in the figure. If you find the line that best fits these three points, it will look something like the dashed line labeled Dˆ . The line that best fits the data is hardly the demand curve. It is just a line that comes closest to fitting points that depend on supply as much as demand. Algebraically, let’s assume QD = a – bp + εD and QS = c + dp + εS. We know in

131

equilibrium quantity demanded has to equal quantity supplied. We can thus solve for the equilibrium price as follows. QD = a − bp + ε D = c + dp + ε S = QS

( d + b) p = a − c + ε D − ε S pe =

a − c + εD − εS ( d + b)

Equilibrium price depends on everything captured by a and c (income, costs, etc…) AND the error in the demand equation. Thus the actual equilibrium price is directly correlated with the omitted variables that affect quantity demanded. If we try to use standard regression analysis to estimate the demand curve, we would put quantity on the left and price and other demand determinants on the right. But, the higher the demand error, the higher price will be. Since the right hand side variable, price, is directly related to the error term, when there is a positive demand shock, the direct effect will be to boost demand, but, the increase in price that comes with the demand shock will cut quantity demanded. It is therefore not obvious how to relate, and, an OLS regression certainly can’t sort it out. It could easily look like increases in price are associated with increases in quantity demanded when all that is happening is that positive demand shocks are pulling up demand, thereby increasing both quantity and price. Thus, the higher the error in the demand regression, the higher estimated price will be, and since we can’t measure the demand error, we’re stuck with this endogeneity problem. Basically, since both price and quantity depend on each other, as well as everything else inside the system (income, wages, etc.), some of which are omitted from the model. Therefore, the fact that many things are endogenous, or determined simultaneously, means many variables on the right side of an equation will be correlated with the error term. This form of omitted variables bias is called endogeneity bias or simultaneous equations bias. The fact that so many things in economics are potentially endogenous means that OVB is systematic and wide spread, not simply confined to circumstances where explanatory variables just happen by chance to be correlated with omitted variables – there are systemic reasons to expect such correlations to be widespread. Another systematic source of omitted variables bias is measurement error. We know the dependent variable of a regression has an error term associated with it due to the influence of variables we have not included in the empirical model. But, variables, both dependent and independent, are also observed and measured with error. Think of this type of error as usual data entry error, or error that results from measurement to a limited amount of significant digits, etc. Suppose we have the following “true” model for y:

y = b0 + b1 x + εy . Suppose that rather than observing the true value of x, we observe or measure a value of xm, which is the true value plus our measurement error:

132

x m = x + εm .

Solving this for x, we have:

x = xm − ε m . Substituting this into the model above we get

y = b0 + b1 ( x m − εm ) + εy y = b0 + b1 x m + (εy − b1εm ) . This is the model that is feasible to estimate, given our observations. The new error term is (εy–b1εm). The measured value of x is correlated with εm, so it is correlated with the error term. Thus, measurement error introduces OVB. This is not the same indirect bias that we were concerned with when talking about the supply and demand example – this is simply because there was noise (imperfections) when we measured x. Intuitively, when x changes, there is no way to know if it is changing in truth, or, only as the result of a measurement error. So, if no change in y is observed, there is no way to know if that is because y is not related to x (b1=0) or because the observed change in x was only measurement error. Thus, OLS cannot identify the true underlying relationship and separate it from the noise.

Reducing Bias It is possible to take steps to lessen the degree and impact of omitted variables bias. We will briefly discuss five: market trials, natural experiments, instrumental variables, panel data techniques, and regression discontinuity designs. The treatment of each of this will be very brief and incomplete. It is intended to give you an intuitive idea of the types of things that are done to reduce OVB so that you will be in a better position to evaluate published studies and consultant reports, not to give you the technical expertise to follow every detail of such studies or to conduct them yourselves. For those interested, a quick search of any of these topics online will turn up a large number of examples and references.

Market Trials First, a firm could run a market trial; that is, releasing a certain product in “test markets” that are essentially a treatment group, while holding other markets constant. One problem with this method is that you don’t know how randomized your treatment group is. Customers in the trial markets are likely to exhibit systematic differences from those in other markets (the control group) even if the trial markets are randomly selected – unless the number of trial markets is very large. Second, it is unlikely customers will be ignorant of the experiment for long. If they figure out what’s going on, the experiment is no longer double blind, and if certain customers are getting a lower price and realize it, they may buy more than

133

they otherwise would and invalidate the experiment. Another problem is that charging the treatment group a price different from the expected profit‐maximizing price means a firm is potentially foregoing several months of profit.

Natural Experiments The second technique is to find a “natural experiment” with which to identify the effect of the variables of interest. These are events which are clearly outside of the influence of markets or their participants which introduce exogenous (outside the model) changes in explanatory variables that are otherwise determined in the model. A potential example of a natural experiment is a government policy that has large impacts on some locations and no impact on other locations. The key to identifying a natural experiment is finding a place where an outside influence caused an exogenous change in explanatory variables BUT where that outside influence should have no direct effect on the dependent variable itself. Take the following as an example. Say you want to know what the effects of minimum wage are on employment. You know demand curves slope down, and you think raising the wage will decrease the quantity of unskilled labor demanded. So, you hypothesize that minimum wages (just like other price supports) cause surpluses of low skill labor, and you test that. Minimum wage varies by state, so you gather data for the whole country and regress unemployment rates against the minimum wage data. Suppose your results tell you that states with higher minimum wages have lower unemployment. This is the opposite of your original hypothesis – but the problem is, this research design likely suffers from omitted variables bias. It is entirely possible that states with higher unemployment have lower minimum wages for reasons totally unrelated to the direct effect of minimum wages on employment levels. Perhaps locations with a high demand for labor and low unemployment set a wage which is nominally high, but, does not matter much because few people would work at that wage in equilibrium anyway. So, it’s possible that states with high minimum wages have other characteristics that would make unemployment lower that have nothing to do with minimum wage. This is the problem of causal identification. How can natural experiments help in this example? Suppose the federal government institutes an increase in the federal minimum wage, and some states are affected by it and others are not. This is an example of a natural experiment, since the individual states were not in control of the change. Now, you can compare the change in unemployment between the states that were affected by the increase, and those that weren’t, using D.I.D. measures. Since you didn’t spend any money on conducting this experiment, and it was more of a natural byproduct of exogenous changes, it is an example of a natural experiment. The fact that the change comes from the federal government arguably makes assignment of individual cities to the “treatment” group(s) random enough for the statistical analysis to be meaningful. Or, maybe, not so much. In our example, it’s possible the states where minimum wages were already high had systematic differences in their related policies (such as

134

welfare, subsidized housing, etc.) and market characteristics (industries, education, demographics) that mitigate the effects of changes in the minimum wage (that is, interact with the change in minimum wage in determining labor supply and demand). Or, perhaps changes in the expected impact of a high minimum wage in the states where the federal change lead to real changes in the minimum wage are what lead to the federal policy change in the first place. If changes in underlying conditions in the states affected by the policy are what caused the policy to be enacted at any given time, the change was not exogenous, and the experiment would no longer avoid the endogeneity problem.

Instrumental Variables A third way to reduce OVB is to use instrumental variables. Let’s start with an example of an instrumental variable, and then proceed with a more general definition. Suppose quantity demanded is

QD = a − bp p + bm m + εD where p is price and m is income; quantity supplied is

QS = c + d p p − dw w + εS where p is price and w is the wage rate. Now, looking at the equation for demand, we know price is correlated with the demand error term since price is endogenously determined. The idea with instrumental variables is to find a variable that is correlated with price, but not with the demand error. Then, we can use that variable to identify changes in price that have nothing to do with changes in the demand error, and use those changes to estimate the demand curve. A variable of this type is called an instrument because you use it as an instrument to identify exogenous variation in endogenous variables. Looking at our equations for supply and demand, if we were to solve for equilibrium quantity, we would get both quantity and price as functions of income, wages, and the two error terms, or

QE = f ( m, w,εD ,εS ) p E = f ( m, w,εD ,εS ) If we were now to regress this function of price against these variables, we would have a predicted equilibrium price given by the following:

pˆ E = α 0 + α1m + α 2 w This predicted price is determined by income and wages, and is not correlated with the demand error so long as income and wages are exogenous to the model. Thus, we can use it as an instrumental variable, and plugging it into our equation for quantity demanded we can run a new regression: QD = a − bp pˆ E + bm m + e

135

where e is a different error term than the one in the original equation. The major problem with using instrumental variables is that if you look at a broad enough picture of the economy, most every variable is dependent on most every other variable. Since we’ve developed a function to estimate price based on income and wages, we’re assuming that income and wages are exogenous – but that is not true. In fact, wages are a primary determinant of income, and, many factors that affect income and wages are likely to also affect demand directly. So, for a variable to be an instrument it has to be (a) not correlated with the error term, (b) not directly in the equation, but (c) correlated with the endogenous variables that are in the equation – and this can be quite difficult to find.

Panel Data Techniques A fourth method for reducing OVB is to collect panel data. The idea is to follow a group of people (or cities, etc.) over a long period of time. In a time series, data is collected on one individual from time period t = 1, 2, …, T. Cross section data is information on several individuals i = 1, 2, …, N in a single time period. Panel data combines these ideas, and follows N individuals over T time periods. They can help by isolating idiosyncratic differences between individual observations. That is, suppose customers in Atlanta like your product for some unknowable reason, and customers in Gainesville don’t like your product for some unknowable reason. As a result, the manager in Atlanta sets a higher price and sells more units. You don’t want to conclude that just because you can charge a higher price in Atlanta and sell more your demand curve slopes up. We might try to correct for systematic but unobserved differences across cities (or individuals) by including an indicator, or dummy, variable for each city. The model would look something like

yit = β 0 + β1 x1it + β 2 x2it + ... + β K xKit + α t + δ 2 d 2 + δ 2 d3 + ... + δ 2 d n + ε it . In the model, yit is the dependent variable (price, etc.) in city i and time period t. It depends on K independent variables x1it, x2it, and so on, which are also associated with that city and time period. A possible time trend is captured by including t, and α represents the general increase in the dependent variable per unit of time which is constant for all years and all cities. The model also contains indicator variables (the d’s) for each different city, in order to statistically pick up the idiosyncratic differences between them. So, d2 could be for Gainesville, d3 for Ocala, d4 for Orlando, etc. and would be 0 if you are not observing that city, and a 1 if you are. The coefficients (the δs) measure the average difference between each category and the “base” category, which is taken to be category 1 in the equation above. Using summation notation, the model could be written more compactly as: K

k =0

i =1

yit = α t + ∑ β k xkit + ∑ δ i di + ε it .

136

In this notation, the first x variable is simply a 1 for each observation, as in the earlier chapters. You can then compare the data across the observed years. The change in y is

Δy it = y it − y i( t −1) and, plugging our equation for yit and yi(t1) into this equation we get K

yit − yi (t −1) = α (t − (t − 1)) + ( β 0 − β 0 ) + ∑ β k ( xkit − xki (t −1) ) + ∑ δ 2 (d 2 − d 2 ) + ε it − ε i (t −1) or K

Δyit = α + ∑ β k Δxit + Δε it . 1

In this equation, the intercept, α, reflects the average growth across all cities over time. So, when looking at the change in y, the constant and the dummy variables drop out. In essence, looking at the change in yit has wiped out all of the fixed idiosyncratic differences that don’t change over time. It does not matter if we actually estimate this model in the form of differences, or, if we just include dummy variables for each city, the effect is the same in practice. The problem that is inherent in this method is that, while the model eliminates fixed city‐specific differences, it has no way of controlling for variable city‐specific differences. For example, if demand is higher for whatever reason in Atlanta than in Gainesville in year 1, there is no reason to think that it will stay the same amount higher in year 2, and thus endogeneity will be present in the regression. If, for example, unmeasured income is trending up more in Atlanta than in Gainesville, a good manager will likely be raising price every period MORE in Atlanta, and, experiencing relative increases in quantity sold. Dummy variables, or differencing the data, which is equivalent, will only take care of fixed idiosyncratic differences between cities, and to this extent let you identify causal relationships. They do nothing for time varying idiosyncrasies.

Regression Discontinuity Designs Lets consider an example. Florida has what is generally considered to be a stringent accountability system for its public K‐12 schools, at least compared to the systems put in place in other states to comply with The No Child Left Behind Act of 2001. Suppose you would like to study the effect of school accountability measures. Specifically, suppose you want to know whether the extra resources and pressure a school that receives a “failing” grade receives as a result of its failure result in improvements in student performance, or, changes in something like class size or hours of instruction. That is, consider a school that received a “D” in 2005 and an “F” in 2006. If the additional pressure and resources lead to improvements in student performance, then the grades of individual students in those schools in 2007 should

137

increase relative to their 2006 level, at least as compared to schools who did not “fail” in either 2005 or 2006. To test this, we might compare the change in student scores on standardized tests in schools that went from a “D” to an “F” the previous year to changes in standardized scores for schools that were a “D” in both years. We could just test for differences in means. Alternatively, we could do it as a regression model. First, define a dummy variable, Failit, which is 1 for student i if their school received a failing grade in year t after receiving a D in year t1 and 0 if it received a D both years. (We are excluding students at schools that received other grades, to keep the comparison as clean as possible.) If Testit is student i’s test score in year t, and again ∆ represents “change in,” the model is:

ΔTestit = β 0 + β1 Faili ( t −1) + ε it . The hypothesis implies β1 is positive. The hypothesis is not that students do better in failing schools – by definition they do worse. The idea is that in the year after a school fails, the students learn more than they did the year the school failed, due to the pressure put on a failing school and the extra resources that flow to it. The problem with running the model above, or testing for differences in mean test score changes across the groups (the two are statistically equivalent) is that school grades are likely to be correlated with a number of other characteristics of the school and its students which also affect test scores. It will be possible to control for some of these things, but, not all of them. And, you will never know how many confounding omitted variables there are, or, how important they are. So, what to do? A regression discontinuity approach potentially offers a way out. It works basically as follows. Find the schools that just barely passed and the schools that just barely failed, and, compare changes in student test scores in those schools. Except for random differences, these schools should be similar. As long as the threshold for passing or failing is hard, and as long as every school that “fails” gets the same “treatment” and no school that does not fail gets that treatment, this is essentially equivalent to a randomized trial. The technique is limited in that it can only be applied to cases where the “treatment” is only received when some variable is above a well defined threshold (here when a school’s test scores fall below an established level). Further, when the threshold must be met to receive the treatment, but not everyone that meets the threshold gets the treatment (crossing the threshold is necessary for “eligibility” but does not guarantee all that are eligible will get the same treatment), the technique becomes more complicated and more prone to bias. In summary, all five of these techniques can be used to reduce OVB, but none of them are inerrant. It is important to keep this in mind when looking at empirical results of any kind. This is a good point to again emphasize that no result in economics, or any other field of study that does not have ready recourse to carefully controlled experimentation, should be taken seriously unless all three of the following conditions are met.

138

First, the result is consistent with a believable, reasonable, understandable story. You may have to think hard to grasp why the story is sensible, but, if you don’t eventually understand it at an intuitive level, be suspicious. Second, the result is consistent with a simple but formal mathematical model that captures the essence of the argument, even if it is based on vast oversimplifications. The reason for this is that by thinking about the problem mathematically, it forces your logic to be consistent. Without this, a persuasive speaker who knows how to appeal to human fallibilities and tendencies to systematic biases in reasoning can make just about anything sound reasonable. Third, the result is confirmed repeatedly, using multiple identification strategies, by multiple independent researchers using multiple independent sources of data. This is not enough on its own, because, absent the ability to conduct repeated randomized double blind trials, a dishonest person can use statistical tools to make just about anything seem to be supported by the data if they try long enough, and even the work of the most careful and honest researchers is subject to tremendous potential biases and difficulty definitively identifying causal relationships. If some finding or recommendation can not be supported in all three of these ways, it should be treated with a high degree of skepticism. Even if it meets all three criteria, that does not prove the finding is correct in any fundamental way. It just means we have a working theory that has not yet been disproved or supplanted by some better theory, and which therefore might provide the most reasonable basis currently available on which to proceed. On one hand, it is always best to retain some degree of skepticism and proceed with caution. On the other hand, being completely skeptical of everything is simply not feasible – it leads to indefinite vacillation. The only way forward is to make use of the “best” findings in economics and other policy related disciplines, but, to keep in mind that they are at best incomplete approximations, more applicable to some situations than others, likely biased, and, certainly not the final truth.

139

Chapter 6 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Difference in Difference (DID) An econometrical tool used to measure the effects of a treatment to a group by comparing the group before the treatment to the group after treatment. Omitted Variable Bias (OVB) A type of bias that occurs when a variable that determines the independent variable is omitted from the model and is correlated with the included explanatory variables. It is a problem because it makes it impossible to separate the effect of the explanatory variable and the omitted variable. Random DoubleBlind Trial (RDBT) An experiment that works to eliminate subjective bias in which neither the experimenter nor the subjects know which group is the control group and which is the experimental. This can become extremely costly and time‐consuming when dealing with economics. Market Trial A type of experiment that uses a test market to test price changes. The test market is essentially a treatment group while holding other markets constant. Problems can arise, however, because market trials are not double‐blind, customers are not randomly assigned across markets so it is impossible to know for sure that the results are unbiased, and they can be very expensive because the firm is not setting price at profit‐maximizing level. Endogeneity/Simultaneous Equations Bias A type of bias that occurs when the independent variable is correlated with the error term in a regression model. Natural Experiment A type of experiment that finds exogenous changes that cause big changes within the system of variables that are being measured. D.I.D. measures are used to compare the impact of a federal change on each state. Problems can arise if the states change over time at different rates or the timing of the natural experiment is endogenous. Panel Data Method of following a group of individuals over a long period of time. This works to separate customers into different cities, isolating their idiosyncratic differences. Indicator variables are used to pick up all of these idiosyncrasies. The problem with this method is that it only works for individual‐specific effects fixed over time. Instrumental Variable A variable that is not directly included in the model, correlated with the endogenous right‐hand side variable, and not correlated with the error term. It is used as an instrument to identify what changes in price have nothing to do with the demand error and then estimate the slope of the demand curve using these price changes. However, these variables are hard to find.

140

Part 3 A Closer Look at Some of the Tools

141

Chapter 7 Individual Choice In chapters two and three we used demand curves to represent the consumer side of the market, in particular the way price is related to the quantity demanded and the way income levels and the prices of substitutes and complements affect demand. Consumer theory is the study of how the underlying preferences of individuals and the limits imposed by income and market prices shape individual choices. This framework gives rise to a more detailed theory of demand. It also has applications to interesting economic questions in its own right.

Underlying Assumptions about Preferences Consumers choose from among many possible combinations of any number of goods to consume. Each particular combination of the goods is called a bundle of those goods. We model consumers as if they choose their most preferred bundle from among the feasible options. Utility is the name of the metric economists use to measure preference. It is what is known as an ordinal measure, not a cardinal one. That is, it assigns a higher number to more preferred bundles than to less preferred alternative bundles. The units of utility are arbitrary, it is only the ranking that matters. If one bundle is preferred to a second bundle, it does not matter if the utility of the first is 10,000 and the utility of the second is 0.0001, or if the utility of the first is 4 and the utility of the second is 3.9999. All that maters is the number associated with the first is higher than the number associated with the second. Statements like “bundle A is twice as good as bundle B” are not meaningful with the utility metric, as “twice as good as” has absolutely no meaning with an ordinal measure. In order to develop the theory of individual choice, we first list and describe our basic assumptions about individual preferences. The first assumption is completeness. This means that when faced with a choice between any two bundles, call them bundle A and bundle B, the consumer either prefers A to B (denoted A B ), prefers B to A (denoted A ≺ B ), or is indifferent between the two (denoted A ~ B ). These are the only three logical possibilities, and preferences are complete in the sense that the consumer can make one and only one of these three statements about any two possible bundles of goods. The second assumption is that more is better. This means consumers are better off if they have more of one good and no less of all the others, all else equal. So, if B has more pizza than A and as much of everything else, B A . Keep in mind here we are discussing preferences only, not the cost of attaining the bundles or what the consumer would ultimately choose given the cost. The assumption that more is better is assumed to hold in the range of combinations of goods a consumer would normally purchase, not necessarily for any possible combination. For example, when choosing between five pieces of pizza and 10,000 pieces of pizza, a consumer invariably would choose the five pieces, due to the infeasibility of eating and storing all the extra pizza. Although this seems to violate the assumption that more is better,

142

10,000 pieces of pizza is not in the relevant range – we will not be analyzing choices over bundles with 10,000 pieces of pizza. What about trash? People prefer less trash ‐ does that violate the assumption? Not if we define things in terms of trash removal. We assume the good is trash removal, which people would prefer to have more of (trash is a “bad”, not a “good”). Third, when faced with more than two bundles to choose from, the consumer’s choices must satisfy transitivity. That is, if A B and B C , then A C . For example, a consumer cannot say they’d rather have a hamburger than a pizza and a pizza rather than a burrito, but that they’d rather have a burrito than a hamburger. The fourth and final assumption relates to a consumer’s marginal rate of substitution between two goods, so we must first define it. A consumer’s marginal rate of substitution of good X for good Y (denoted MRSXY) is the amount of good Y he is willing to give up for one more unit of good X that preserves his level of happiness. For example, if a consumer currently has two pieces of pizza and three cokes, he may be willing to give up two cokes in exchange for one more piece of pizza. Thus, his marginal rate of substitution of pizza for coke is 2. Note that this value is different than his marginal rate of substitution of coke for pizza. It will have occurred to the thoughtful reader that this rate varies according to how much of each good the consumer has. For example, the consumer may be willing to give up two cokes for one more piece of pizza when he has three cokes; however, this number will surely change if he has only one coke. In fact, as we can intuitively conclude, the less coke the consumer has, the less he is willing to give up for one more piece of pizza. This is actually the fourth assumption, and it is succinctly known as a diminishing marginal rate of substitution. If you believe these four assumptions to be true, then a consumer’s choice between bundles of goods when faced with a decision may be expressed as a mathematical maximization problem where the consumer maximizes a utility function (where the ranking matters but the units are arbitrary) subject to the constraint imposed by their limited income and market prices.

Indifference Curves and Preferences An indifference curve shows different combinations of two goods that provide the same amount of satisfaction, or utility. If we are equally happy with two cokes and one piece of pizza as we are with one coke and three pieces of pizza, we are indifferent between those two options – hence the term indifference curve. An indifference curve for two goods can be depicted on a graph where one axis measures the amount of one good (say good X) and the other axis measures the amount of the other good (say good Y). The figure below depicts three types of indifference curves. The “L” shaped indifference curve depicts the preference of a consumer who views goods X and Y as perfect complements. Remember, the indifference curve shows different combinations of X and Y that make the consumer equally happy. So, for a given amount of good X, additional units of good Y do not move the consumer off of his indifference curve; that is, they don’t increase his satisfaction. In this case,

143

the goods have no substitutability, and utility can only be increased if he has Y Imperfect Substitutes some additional amount of both goods – they are consumed together or not at all. For example, someone who has no use for either peanut butter or jelly individually, ever, under any Perfect circumstances, but will consume them Complements together, views them as perfect Perfect complements. Substitutes If the consumer’s indifference curve X looks like the straight line in the figure, goods X and Y are perfect substitutes. In essence, the consumer is willing to trade a specific amount of Y for one more unit of X, and this specific amount never changes. For example, if a consumer would always be willing to give up two cokes for one piece of pizza, no matter how much coke and pizza he has, the goods would be perfect substitutes to him. The third indifference curve, in the figure represents imperfect substitutes. This The consumer is willing to trade some Y for one more unit of X, but the amount of Y he is willing to give up for another unit of X decreases as he has more X and less Y. This indifference curve is consistent with the assumption that more is better and the assumption of a diminishing MRS. Y Begining at point a in the figure to the right, a if the amount of good X increases by one unit, in Y3 order to stay on the same indifference curve the amount of Y consumed must decrease from Y3 b to Y2. Otherwise, we’d have more utility since Y2 c more is better. This is why indifference curves Y1 have a negative slope. X Indifference curves with this shape also exhibit a diminishing MRS. Going from point a 1 1 to point b, the consumer gains one unit of X. For this unit of X, the consumer is willing to reduce their consumption of Y from Y3 to Y2. But, going from point b to point c, which is also a one‐unit gain of X, the consumer is only willing to decrease Y from Y2 to Y1, a smaller decrease. This is because at point b the consumer has less Y and more X, thus Y becomes more valuable in terms of X. This is why indifference curves are convex to the origin –the diminishing marginal rate of substitution. This property is also sometimes expressed as a preference for variety. From the last figure, it should be clear that the MRSXY is just the absolute value of the rate of change of Y with X along the indifference curve. The rate of change at a particular point is just the slope of the tangent at that point. Saying the MRSXY diminished along an indifference curve is saying that a line tangent to the

144

indifference curve gets flatter as X increases, since the tangent IS the graphical representation of the concept of the MRS. Indifference curves cannot cross, as do the two in the figure to the right. Why? We know that a has more of both good X and good Y than b, so a b . But b and c are on the same indifference curve, and since indifference curves are defined to be different combinations of goods that give the same amount of utility, b ~ c . Transitivity then tells us that since a b and b ~ c , a c . But a and c are on the same indifference curve, so this cannot be true. So indifference curves cannot cross.

Y c

b X

Instead, a graph with multiple indifference curves must look something like the one in the next figure, below and to the right. The figure shows three indifference curves labeled 1, 2, and 3. The figure labels each indifference curve according to the X or Y value where it crosses the 45 degree line, at which point Y=X. Completeness means that every possible consumption bundle is on some indifference curve which is associated with some (ordinal) measure of utility. That means that any number of indifference curves may pass between indifference curve 1 and indifference curve 2, and between any two other indifference curves. Indifference curve 2 represents Y higher utility than indifference curve 1. Y=X This is obvious since, along the points where they cross the 45 degree line, where Y=X, indifference curve 2 has 3 more of both goods than indifference 3 curve 1. Since indifference curves above 2 and to the right involve higher 2 consumption levels, they represent 1 higher utility, or more preferred 1 combinations of goods. Similarly, 1 2 3 X indifference curve 3 represents a higher level of utility than indifference curve 2. Since the units of utility are arbitrary, and only ranking matters, we could have as easily called the curves 0.7, 26.263, and, 1,140.3.

Utility Functions, Marginal Utility, and the MRS Graphical representations of indifference curves are quite useful. For some purposes though, it is far more convenient and powerful to have a mathematical tool to focus and check our reasoning. Above we claimed it is possible to represent any preferences that satisfy the four assumptions described above with a mathematical function. Such a function is known as a utility function, and would be denoted U(X,Y) if the two goods consumed are X and Y. This claim may sound unreasonable. After

145

all, how can something as subjective as preferences, which have no natural numeric scale beyond a simple rank ordering, be represented in an equation? Actually, we already showed why and how the claim is true in the last figure above. Any bundle of X and Y lies on some indifference curve. That indifference curve represents all bundles of X and Y that provide the same level of satisfaction. So, we can map the bundle (X,Y) to a specific number that signifies the indifference curve it falls on. Call that number U(X,Y). All points on that indifference curve get that number. All higher indifference curves get progressively higher numbers. All lower indifference curves get progressively lower numbers. In the figure, we assigned those utility numbers by the point where the indifference curve crosses the 45 degree line, but the exact numbers are not important. Saying that combination ( X 3 ,Y3 ) ( X 2,Y2 ) is equivalent to saying U ( X 3,Y3 ) U ( X 2 ,Y2 ) . Thus, even if we have no idea what form it takes, we know a utility function exists that represents any the consumer’s preferences that satisfy the four assumptions stated above. As we’ve stated before, the units of utility are arbitrary; all that is necessary for a utility function to be representative of an indifference curve is that it preserves the rank given by the consumer’s preferences. Let’s illustrate this point with an example. Consider the following three utility functions: U1 = X + Y + XY , U 2 = ln( X + Y + XY ) , and U 3 = 2( X + Y + XY ) .

All three increase as X + Y + XY increases. So, we could write the second two as U 2 = ln(U1 ) , and U 3 = 2U1 . While the scale and exact shape of each is different, it should be clear that they rank any X Y U1 U2 U3 combination of X and Y in the same order. To be concrete, two possible combinations of X 3 1 7 1.95 14 and Y are considered in the table to the right. 2 2 8 2.08 16 We can see that utility function U1 ranks the second bundle (2,2) higher than the first (3,1), as do the other two. The scale doesn’t matter – only the ranking. All three of these utility functions map the bundles to the same indifference curves. While a utility function exists for every indifference curve, it is by no means unique. But, there is only one underlying ranking of possible consumption bundles that is consistent with any consumer’s preference. The marginal utility of a good is the change in utility that results from a small increase in consumption of that good. The partial derivatives of the utility function with respect to the amount of good X consumed gives the marginal utility of X, MUX. So, if utility depends only on X and Y, the marginal utilities are as follows.

∂U ∂X . ∂U MU Y = ∂Y MU X =

(7.1)

The marginal utilities are basically how far up the indifference map the consumer moves when they have a little more of one good. Suppose the marginal utility of X is 3 and the marginal utility of Y is 0.5. Then, if the consumer gets one

146

more unit of X, how many units of Y could they give up while retaining the same level of utility? Since another X increases utility by six times the amount of another Y at the margin, the consumer could give up 6 units of Y for one X while remaining on the same indifference curve. Thus, the marginal rate of substitution of X for Y is equal to the ratio of the marginal utility of X to the marginal utility of Y:

MRSXY =

MU X . MUY

(7.2)

We noted above that the MRS was just the absolute value of the slope of the indifference curve. Another way to see that is to note that the change in utility, dU, for small changes in X and Y, dX and dY, can be expressed as:

dU = MU X dX + MU Y dY

(7.3)

This just says utility increases by MUX per unit increase in X plus MUY per unit change in Y, at least for very small changes. Along an indifference curve, dU=0. So, this can be rearranged as follows: MU X dX + MU Y dY = 0 MU Y dY = − MU X dX

(7.4)

MU X dY =− dX MU Y

Thus, along a given indifference curve, the slope of the indifference curve at any given point is given by the negative of the ratio of the marginal utility of X to the marginal utility of Y, which is the negative of the MRSXY as well.

Budget Constraints Preferences are the first major component of consumer theory. Budget constraints are the second. The way we graphically represent budget constraints is by using, creatively enough, a budget line. The equation of the budget line is when there are only two goods is

m = pX X + pY Y ,

(7.5)

where m is the consumer’s total budget, pX and pY are the prices of good X and good Y, and X and Y are the amounts of good X and good Y purchased. So, if we are on this line, our income, which is the amount of money we have to spend on both goods, is equal to the amount of money that we actually spend on the goods. Generally, the budget line can be defined for any number of goods. It is just the fact that income must equal expenditures. This does not preclude saving. If we want to analyze saving, we would just let savings be one of the goods upon which income is spent. The budget line simply depicts the bundles of goods the consumer can afford. Solving this equation for Y, we obtain pY Y = m − pX X

147

m pX − X . pY pY

This is the budget line in slope‐intercept form. Y The slope of the budget line is the negative of the ratio of the price of X to the price of Y, m pY p − X p . The Y intercept in the equation above Y is m p . That is the amount of Y that could be Y purchased if no X were purchased. Similarly, the X intercept is m p . A graph of the budget

slope = −

line looks like the figure to the right.

The interpretation of the slope pf the budget line will prove to be important. In this context, the slope tells us the rate at which Y can be exchanged for another unit of X at market prices. For example, if the price of a coke is $1.50 and the price of a slice of pizza is $3, we have to give up half a slice of pizza per additional coke purchased. Generally, since income is fixed, as the amount of good X purchased increases, the amount of Y purchased must fall. Since how much of good Y we have to give up in order to obtain one more unit of good X is based on their relative prices, it should make sense that the slope of the budget line is the price ratio. Example: Budget Line

If the price of beer is $1, the price of pizza is $2, and a consumer has $50 to spend on pizza and beer, find and graph the consumer’s budget line. Solution: The budget line is

m = p B B + pP P

50 = B + 2P

Pizza

m = 25 Since we have $50, and one piece p p of pizza costs $2, we can buy a maximum of 50/2 = 25 pieces of pizza. Similarly, if we buy no pizza, at $1 per beer we can buy a maximum of 50/1 = 50 beers. These intercepts give us our budget line:

slope = − .5

pB = −0.5 pP

m = 50 pB

Beer

148

Individual Choice Now that we have the two basic tools of consumer theory, we combine them to analyze consumer decisions. Consumers choose their preferred bundle from among those that are possible. That is, they want to maximize their utility or reach the highest possible indifference curve given their budget constraint. First we look at the problem graphically, then mathematically. The figure on the right shows three indifference curves for a hypothetical consumer, along with their budget line. Indifference curve 3 is out of reach, given their budget. Indifference curve 1 is in reach, but it should be obvious that it is not the highest indifference curve that can be reached. The highest indifference curve that can be reached is represented by indifference curve 2. While it is easy to understand this graph, as far as it goes, it is important to have a more precise understanding of the solution, and to be able to explain that understanding precisely in words.

m py

3 1 m px

2 X

Interpretation of the Solution of the Individual’s Choice Problem Notice that at the solution, the budget line is tangent to the indifference curve. We know that the slope of the budget line is the negative of the price ratio, and that the slope of the tangent to the indifference curve is the negative of the marginal rate of substitution. Thus, at the point on the budget line where the individual reaches the highest possible indifference curve,

MRS XY =

pX . pY

(7.6)

We will refer to this type of condition as an optimality condition. At the solution, the amount of Y the consumer is willing to give up for one more unit of X must equal the amount of Y he would have to give up to get another unit of X in the market. Intuitively, if the value of another unit of X in terms of Y is higher than the cost of another unit of X in terms of Y, the consumer will choose to buy more X and less Y. On the other hand, if the value of another unit of X in terms of Y is less than the cost of another unit of X in terms of Y, the consumer will choose to buy more Y and less X. Making use of the fact that the marginal rate of substitution is equal to the ratio of the marginal utilities, we can rewrite the optimality condition as follows:

149

MU X p X = MU Y pY MU X MU Y = pX pY

(7.7)

This interpretation of this version is that the marginal utility per dollar spent on good X must equal the marginal utility per dollar spent on good Y. If the marginal utility of the last dollar spent on X exceeds the marginal utility of the last dollar spent on Y, the consumer could increase utility by buying more X and less Y, and vice versa. Remember, utility just corresponds to indifference curves. If the marginal utility per dollar spent is higher for good X than for good Y, that just means that spending one dollar more on X will move the consumer up more indifference curves than he would move down by spending one dollar less on Y in order to free up a dollar to spend on X. This optimality condition readily generalizes to more than two goods. It just says that the relative value of any two goods consumed must equal the relative cost, or that the marginal utility per dollar, or “bang per buck”, should be equal for all goods. At a consumer’s most preferred feasible choice, two things must then be true in general. First, they must satisfy the budget constraint – they can’t spend more than they have. Second, the optimality condition(s) must hold – it should not be possible to reallocate expenditures so as to reach a more preferred bundle, or no available good should have a value in terms of the other goods consumed that exceeds its price relative to those goods. Let’s now tie this logical and intuitive reasoning to the graphical analysis using the budget constraint from our earlier example where the budget is $50, one beer costs $1, and one slice of pizza costs $2. The figure below shows the budget line and two indifference curves. Indifference curve III, based on the graph, is unattainable given the budget. Indifference curve I crosses the budget line in two places, points a and b. The line tangent to the indifference curve I at point a is shown by the dotted line. The slope of this tangent line represents the rate at which we can trade pizza for beer at that point while maintaining our level of utility; in other words, the slope of the tangent line is the marginal rate of substitution of pizza for beer at any given point. Intuitively, it means how much beer the consumer is willing to give up to get one more piece of pizza at point a.

Beer

150

III

b I

Pizza

Notice that the tangent line to the indifference curve at point a is steeper than the slope of the budget line. Therefore, at point a,

MRSPB >

pPizza

pBeer .

Remember, the slope of the budget line represents the rate at which the market will allow us to trade pizza for beer. Since pBeer = $1 and pPizza = $2, the market allows us to trade two beers for one slice of pizza; equivalently, if we wanted one more pizza slice, we would need to give up 2 beers. Because the slope of the tangent line at point a is steeper than the slope of the budget line, the consumer is willing to give up more than 2 beers for one more piece of pizza to stay at the same utility level. Since the consumer can give up exactly 2 beers for one more piece of pizza (based on the market prices), he will be better off if he does so. The consumer will buy more pizza and less beer if he starts at point a. As they do so, they have more pizza and less beer, so the value of another slice of pizza in terms of beer falls, or the MRSPB diminishes. Eventually, the consumer will reach a point where the MRS equals the price ratio, or the relative value equals the relative cost at the margin. At point b, it’s just the opposite; the slope of the tangent is flatter, which means the consumer is willing to give up less than 2 beers for one more pizza. So, at point b,

MRSPB <

pPizza

pBeer .

So, if the consumer buys more beer by giving up pizza, he will be better off, since he can get exactly 2 beers for one pizza. As they buy more beer and less pizza, the value of pizza in terms of beer increases, until the value of another slice of pizza in terms of beer equals the relative cost. The consumer, then, will reach the highest indifference curve when the rate at which he’s willing to trade pizza for beer is exactly the same as the rate at which the market allows him to, or when

MRSPB =

pPizza

pBeer .

Beer

At the consumer’s most preferred feasible option, this optimality condition must be satisfied and the consumer must meet their budget constraint. In the figure to the right, this occurs at point c. At points on the budget line with more beer, there is an incentive to buy more pizza and less beer, and vice versa.

50 c II

III

Pizza

The observant reader will have noticed that the optimality condition is a valid description due to the preference for variety, diminishing MRS, or the diminishing value of X relative to Y as the consumer has more X and less Y (these are three ways

151

to say the same thing). Nowhere did we say anything about diminishing marginal utility. The scale of utility is arbitrary, so, in this context, it is completely meaningless to refer to “diminishing marginal utility.”. All that matters is the ratio of the marginal utilities and how that ratio changes with the consumption bundle.

Price Changes and Income Shifts The model above describes determination of a consumer’s individual quantity demanded at a given set of prices and a given income level. As we have explained, the budget line is determined by both the prices of the two goods and the consumer’s level of income. Changes in either of these affect the budget line. Let’s first look at what happens when the income changes. Suppose that the consumer’s income m Y increases in the figure to the right. The effect m1 is that the consumer can buy more of both goods if they so choose, but as the slope of pY the budget line is determined by the prices m 0 Y 1 of the goods – which haven’t changed – the pY budget line will simply shift up. The opposite Y0 would happen if the consumer’s income fell. In the figure, both X and Y are normal goods for this consumer – he consumes more of each when income increases.

m1 > m0

X 0 X1 m0 pX

m1 pX

The reader should experiment with different shaped indifference curves to verify that it is possible for one of the two goods to be an inferior good, for which consumption will fall when incomes increase, but not for both goods to be simultaneously inferior. In addition, if the prices of both goods increase by a given factor, it is just like a decrease in income. Or, if both prices fall by a given factor, it is just like an increase in income. For example, if both prices increase by 50%, it is just like incomes fell by one third. It would be a good idea for the reader to show this graphically and to verify the equivalence algebra and the equation of a budget line. Finally, if prices and income all increase by the same factor, nothing changes. Suppose the price of good X increases from pX0 to pX1 in the figure to the right. Since the x‐intercept of the budget line is determined by how many units of good X we can buy with a fixed amount of income, an increase in the price means we can buy fewer units. Thus, the budget line pivots inward. As we would expect, a higher price of good X means that the new chosen bundle will contain less units of X than the old bundle. If the price of good X were to decrease, exactly the opposite would happen. Note that the change in the price of

152

pX1 > pX 0

m pY

m pX1

m pX 0

X affects the demand for Y in the figure. At the higher price of X, in this example the demand for Y increases. Thus, X would be thought of as a substitute. The reader should experiment drawing different shaped indifference curves to show that it would be possible for consumption of Y to decrease, in which case the goods are complements, or to remain the same. It is also left to the reader to determine how a change in the price of good Y would affect the budget line and equilibrium bundle. There are really two things going on when a price increases. First of all, the relative price changes. At the initial solution and original prices, the value of X in terms of Y equaled the relative cost. Once the price of X increases, the value of X in terms of Y is suddenly less than the cost. That induces the consumer to want to substitute some Y for X. The tendency for substitution induced by a price change is known as the substitution effect. At the same time, the increase in price reduces purchasing power. That means the consumer is poorer. If the good is normal, this will result in a further decrease in the consumption of X. If the good is inferior, this will actually work against the tendency to substitute away from X. The effect of the reduction in purchasing power, or, real income, due to a price change on the quantities chosen is known as the income effect. Changes in prices have larger effects on purchasing power if the good accounts for a large share of the consumer’s budget. So, the income effect is more pronounced for goods that are a large share of the budget. The effect of a price increase above led to a decrease in the quantity demanded. This is consistent with the law of demand. However, the analysis above, in and of itself, does not “prove” the law of demand. To the contrary, the tools of consumer theory can be used to illustrate the strange case where the law of demand would not hold. While it is doubtful that we would ever observe it in practice, especially at the market level, here is how the theoretical argument goes. Suppose someone very poor spends a lot of their budget on inferior goods. If the price of that inferior good goes up, the substitution effect says buy less, the income effect says buy more. Since the good is a large share of their budget, the income effect could win out! Such a good would be known as a Giffen good. For example, suppose a college student is so poor that they eat instant Ramen noodles for all except one meal every week. For that last meal, they order from the dollar menu at McDonald’s. They regard Ramen noodles as an inferior good. If their income increased a little, they would order from the dollar menu at McDonald’s twice each week. Now, suppose the price of Ramen goes down. With the money they save, they might choose to buy from McDonald’s one more time, and therefore eat less Ramen. This is shown in the figure on the right.

153

px1 > px 2

Y M py

1 x2 x1

This situation is pretty unlikely at an individual level. From the perspective of a market, taken as a whole, it seems almost impossible, because while some may spend a lot of their income on instant Ramen, the fraction of the average consumer’s income spent on it will be too small for the income effect to outweigh the substitution effect. As the reader should be able to tell from the figure, it is even hard to draw! The main reason to cover it is that it is a good extreme case for the reader to use to check their understanding of the theory.

Individual Choice – the Calculus Version Recall the consumer’s original goal: reaching the highest indifference curve with their limited budget. Since we have shown that utility functions represent indifference curves, this goal is equivalently stated as a consumer maximizing his utility given their budget constraint. With only two goods, it is possible to solve the budget constraint for Y or X, substitute that expression into the utility function for Y or X, and then just maximize as usual. In the two‐good case, the way to think about the problem mathematically is:

max X ,Y subject to

U ( X ,Y ) . p X X + pY Y ≤ m

(7.8)

How is this problem attacked? Assuming we have defined things so all income is m pX − X , so spent, as we did above, the budget constraint could be written as Y = pY pY m pX − X ) . For example, if the utility function is utility would be U ( X , pY pY U = 2 XY + X , the price of X is 4, the price of Y is 2, and income is 20, this would become U = 2 X (10 − 2 X ) + X . Maximizing that will give you the consumer’s choice. But, it is not a particularly insightful way to accomplish anything. Instead, we can set up and solve this problem by appending what is known as a Lagrange multiplier to the constraint and creating a new form of the problem known as a Lagrangian. For the two‐good case, that looks like the following:

L = U ( X , Y ) + λ[ m − p X X − pY Y ] .

(7.9)

Why do this? This is basically a “cooked” problem that is structured to account for the constraint by introducing a new choice variable, λ (lambda). With the problem set up this way, we can take three partial derivatives and end up with three equations in three unknowns, (X, Y, and λ). Solving those equations will give us the solution to the consumer’s problem. Thus, we can apply what we know about unconstrained optimization to solve the problem once it is written in this way. Taking the partial with respect to X we have

∂L ∂U = − λpX = 0 . ∂X ∂X

154

(7.10)

Since the partial derivative of utility with respect to X is just the marginal utility of X, the equation becomes

MU X − λpX = 0 .

(7.11)

Similarly, the partial with respect to Y yields

MUY − λpY = 0 .

(7.12)

Setting the first equation equal to the second and rearranging terms gives

MU X p X . = MU Y pY

(7.13)

Looking at our result from the first two partials, we see that equation (7.13) is the same optimality condition obtained above. The third and final partial derivative of the Lagrangian with respect to λ is

∂L = m − pX X − pY Y = 0 . ∂λ

(7.14)

Rearranging, this is just the budget constraint,

m = pX X + pY Y .

(7.15)

The solutions to equations (7.13) and (7.15) yield the individual consumer’s demand functions for each good in terms of income and the prices of both goods. We could rearrange the first two partials to get an expression for λ. Doing so, we find the following is true at the solution:

λ=

MU X MU Y = . pX pY

(7.16)

This can be interpreted as the marginal utility of the last dollar spent. In other words, it is a measure related to how far up the indifference map the consumer would move with a little more income. Thus, the solution for the parameter λ could be thought of as the marginal utility of another dollar’s worth of income. Since the parameter for λ was associated with the income constraint, it should not be surprising that it turns out to tell us something about the utility value of another dollar. You will not have to use the LaGrangian on any exam in Managerial Economics. So, why did we go through it? There are several reasons. The first was to show that what we argued above based on logic, intuition, and graphical analysis was entirely internally consistent. The second was because students who are very mathematically inclined but who had trouble with the intuition may be able to get a better picture of the logic and intuition having seen the math. Third, many, even most, of the students who go on to graduate school in many business decisions including not just economics but also finance, accounting, operations management,

155

and decision science will use this technique extensively later on. So, exposure to it at this point may be helpful in preparing you for that experience. Finally, all the software applications that perform constrained optimization use this technique. Many of you who move on to careers in business will have to make sense of reports colleagues or consultants have prepared using such tools. Having some knowledge of the technique underlying the solution should help you make sense of such reports, and allow you to evaluate the results more intelligently.

Summary and Example Problem In summary, two conditions must be met at the consumer’s most preferred feasible bundle. 1.

Optimization condition: A consumer allocates a given budget optimally p MU X p X MU X MU Y between good X and good Y. MRS XY = X , , or . = = pY MU Y pY pX pY

Budget constraint: A consumer cannot spend more than they have. m = pX X − pY Y

Example: Utility Maximization

Suppose a consumer’s utility function for different combinations of pizza (Z) and beer (B) is given by the function U(Z,B) = 4 B + 2 Z . If the price of beer is $1, the price of pizza is $2 and the consumer has $50 to spend on beer and pizza, find the amount of pizza and beer he will buy. Solution: We begin by finding the optimization condition. The marginal utility of beer is the partial derivative of the utility function with respect to B, or

MU B =

2 B

and the marginal utility of pizza is

MU Z =

1 Z

The marginal rate of substitution of pizza for beer is the quotient of the two marginal utilities, or

MRSZB =

MU Z B = MU B 2 Z

Now we can equate this to the ratio of the prices of the goods to obtain the optimization condition:

MRSZB =

B 2 pZ = = 2 Z 1 pB

B =4 Z 156

B = 16Z

So, based on our optimization condition, we have found the ratio at which the consumer will consume beer and pizza. We now must find the second condition, the budget constraint. Plugging in the level of income and the prices of the goods to the budget constraint, we have

50 = 2Z + B

To solve these two equations simultaneously, substitute the optimization condition into the budget constraint:

50 = 2Z + (16Z)

Z ≈ 2.8 and B = 16(2.8) ≈ 44.4

157

Chapter 7 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Completeness An underlying assumption of consumer theory that states a consumer must either prefer one product to another or be indifferent between them. In other words, preferences are exhaustive: A B, B ≺ A, or A ~ B More is Better An underlying assumption of consumer theory that states consumers are better off if they have more of something. This assumption is bounded by the relevant range of what consumers might normally purchase. Transitivity An underlying assumption of consumer theory that states if A B and B C, it must hold true that A C. Utility A metric used to apply math to preference rankings. The units of utility are not important, only the rank order matters. Marginal Utility The added satisfaction of consuming one more unit of a good. Marginal Rate of Substitution (MRS) The amount of one good a consumer is willing to give up for one more unit of another good. Diminishing Marginal Rate of Substitution An underlying assumption of consumer theory that states as a consumer gets more of one good and less of the other along an indifference curve, the value of the good second good increases relative to the first good. This implies a preference for variety. Indifference Curve A curve that shows different combinations of two goods that provide the same amount of utility. Indifference curves do not cross because of transitivity. Budget Line A line that shows how much money a consumer has to spend on two goods.

158

Chapter 8 Applications and Extensions of Consumer Theory

Compensation Indexing and Compensating Differentials Even though utility is a nebulous term that has no fixed scale, and thus cannot be directly related to a consumer’s actual “happiness”, the theory of individual choice still has many useful applications. Often, firms find it in their interest to relocate employees from one city to another. Since housing and other costs may differ across cities, management often adjusts salaries across cities to compensate employees for such differences. How much should compensation adjust for changes in the costs of housing and other goods and services across cities? In setting up the model, we need to consider what is most important to include and what we are to ignore. For items that are easily transported from one city to another, arbitrage means we should expect the “law of one price” to hold, but for relatively small transportation costs. That means we can ignore variation in the prices of transportable goods, and any goods and services that are available on the internet. At the other extreme, land simply can not be moved from one city to the next, no matter how large the price difference. That, coupled with the fact that housing accounts for such a large share of a typical consumer’s income, means we must account for differences in the price of housing in the model. Services which are highly labor intensive are an intermediate case. Their costs will vary somewhat, due to the effect of cross city differences in wages, but not as much as housing costs. To keep the model simple, we will ignore the role of variations in services costs across cities. But, in interpreting the model, we should not forget that in reality, service costs will vary, too. We will therefore assume that utility depends on the amount of housing consumed, H, and the amount of everything else consumed, E. So, the utility function is written as U(H, E) . The price of housing, or the rental rate per unit of housing, is R. The units of housing are most easily thought of as square feet. Since we are assuming everything other than housing costs the same in all cities, the cost of one dollar’s worth of everything else, pE, is just 1. Thus, the budget constraint is E M = RH + E . We will assume, for simplicity, that income is entirely determined by the salary the employee is MG paid by the firm. Let’s say an employee currently works in Gainesville and his firm would like to relocate him to Atlanta. The current budget line in Gainesville is shown in the figure to the right. If they just buy everything else

159

UG HG

MG RG

and no housing, they could consume MG units of E. If they just buy housing and nothing else, they could consume (MG/RG) units of housing. Given the budget constraint, they can reach the indifference curve UG in Gainesville, and they will consume HG units of housing and EG units of everything else. At the solution, the value of another unit of housing in terms of everything else, MRSEH, is equal to pH/pE, or R:

MU H =R MU E

Suppose the firm wants to move the employee to their Atlanta brand, and that the price of housing in Atlanta, RA, is higher than the price in Gainesville. If they were offered the same salary, their budget line would pivot in, due to the higher cost of housing. At the new budget line, the employee can no longer reach indifference curve UG, and thus wouldn’t be willing to move (assuming they can find a similar job in Gainesville).

(8.1) E

MG EG

UG M G HG RA

MG RG

To get the employee to move, one option would be for the firm to pay enough more in Atlanta to allow the employee to consume the same bundle that they consumed in Gainesville. That level of income, denoted Mˆ A , is calculated as

Mˆ A = RA H G + EG .

Graphically, this is a shift in the new budget line out until it reaches the original consumption bundle, (HG, EG). Remember, the slope of the budget line is (pH/pE), and since pE is $1 and pH is RA, the slope of the new, higher budget Mˆ A doesn’t change; we’re just shifting up the intercept to get back to the original bundle.

(8.2)

E Mˆ A MG EG

UG Is this the best option from the perspective of the firm? It’s true that Mˆ A will M G HG H MG get the employee to move – but is there a RA RG cheaper way to accomplish this? Looking at the graph, we see that with the new budget line, relating to the level of income Mˆ A , the employee can actually reach a higher indifference curve than they reached in Gainesville. So, with new income Mˆ A , the employee will actually be better off. The intuitive reason why this is the case is because when people move to places with more expensive housing, they economize on housing. That is, they are willing to substitute away from housing when its relative price increases. You can see at the

160

new higher indifference curve, the bundle would include a smaller amount of housing than the employee consumed in Gainesville. Thus, when considering how to compensate the employee for moving at E ˆ the least possible expense to the firm, we M A don’t have to provide him with the same M A bundle as he had in Gainesville. Instead, we simply need to make sure he can MG EA obtain the same indifference curve in the new city (i.e. is just as happy). So, EG management should raise salary in Atlanta UG above the level in Gainesville by HA MG HG H MG something less than it takes to reach the same bundle. The necessary salary RA RG corresponds to the budget line labeled MA in the figure to the right. This allows the employee to reach the same indifference curve, UG, by buying the combination of goods (HA, EA). Now, let’s say the firm wants to move the employee from Gainesville to the Keys. Housing is more expensive in the Keys. But, the employee really likes the beach, fishing, etc. So, there is an amenity difference ‐ the Keys are a more alluring location than Gainesville to the employee, all else equal. In this case, the indifference curve for housing and other consumption that the E employee would have to reach in the Keys to stay just as happy as he was in Gainesville would be below the old indifference curve that they had in Gainesville, because the amenities themselves are compensation for MG the move, at least in part. We will call this new indifference curve UK=G, since the utility UG the employee gets from it (when he is in the UK=G Keys) is the same as the utility he gets from H MG MG his old, higher indifference curve (when he is RG in Gainesville). The situation is shown in the RK figure to the right.

161

We know the level of income required to obtain the original bundle from Gainesville ˆ in the figure, is more than in the Keys, M K the firm needs to pay. But, since the employee gets extra utility just from being in the Keys, we don’t have to pay him as much as it would cost to reach a point on indifference curve UG. Since UG and UK=G correspond to the same level of utility, we need to only reach the indifference curve UK=G. This takes a salary of MK.

E ˆ MK MK

MG UG MG RK

MG RG

UK,G H

Thus, we have seen that differences in housing cost and differences in inherent utility bearing conditions, called amenities, create differences in the wages paid to similar workers doing similar jobs in different cities. Such wage differences are known as compensating differentials. In locations that are more pleasant or cheaper to live in, wages are lower, all else equal. In places that are cold, dreary, or otherwise unappealing to an employee, workers will demand higher wages, as they will in places that are more expensive. This concept can also be applied to jobs that are risky, difficult, and unpleasant versus jobs that are relaxed, non‐stressful and safe. These types of differences between jobs will be reflected in the wages that firms have to pay.

162

Individual Choice, Individual Demand, and Market Demand Our analysis of consumer theory allowed us to find the quantity of a good a consumer purchased for market prices and income. We were also able to determine how changes in market prices affected the quantity the consumer purchased. By tracing out the quantity of good X a consumer purchases at different prices and income levels, we arrive at that consumer’s individual demand curve. The figure to the right shows how this works. For a fixed level of income m and a price of good Y of pY , we have three different prices for good X: pX1 is the most expensive, pX2 is cheaper, and pX3 is the cheapest. We can see, based on consumer optimization, as the price of good X increases, the quantity that the consumer demands decreases.

pX 1 > pX 2 > pX 3 m py

3 1 X1

2 X

p p1 p2

p3 If we transfer these three points of d consumption to a graph with price on the vertical axis and quantity qX q1 q2 q3 demanded on the horizontal, we will obtain a graph of this individual’s demand curve for good X, as shown in the lower panel of the figure. As we can see, an increase in the price of good X tends to decrease the quantity of good X this consumer demands. While theoretically Giffen goods could exist for some consumers in some rare cases, this is the usual one. Suppose the market for good X contained only this consumer, whose demand curve we have just modeled. Then, the market demand curve for good X would be identical to the consumer’s demand curve. Typically, however, there is more than one consumer in a market. How do we go from individual to market demand?

163

Assume the market for good X consisted of three consumers. For a given price of good X, each customer would demand a certain quantity. At each price, the market demand curve represents the p total quantity demanded by all the consumers in the market. This is shown in the figure at right. p1 The three individuals’ demand curves are shown by p2 d1, d2, and d3. The quantity of good X that consumer 1 demands at each price are p3 d1 d2 shown by x1, x2, and x3, but Dx d3 the other two consumers also Qx demand a certain amount at x1 x2 x3 each price. The market demand curve, DX, is simply the sum of these three consumers demanded quantities at each price; in other words, it is the horizontal summation of the individual consumers’ demand curves. This generalizes out to any market demand curve, regardless of how many consumers there are. Our conclusions from consumer theory about changes in the prices of related goods and income levels also carry over to market demand. Recall from our analysis of indifference curves and budget lines that if the price of good Y were to increase, the individual consumer would not only buy less of p good Y, but he would also Ï pY substitute into good X, which is now cheaper relative to the initial price levels, if the goods are substitutes. Thus, if good Y is a substitute for X to enough consumers, the demand curve for good X Dx’ would shift outward, as in the D x figure to the right. Similarly, if the price of good Y were to Qx decrease, the demand curve for good X would shift inward.

164

Finally, recall that if a consumer’s income, m, were to increase, he would buy more of good X at all prices if is a normal good, and less p if it is inferior. Assuming a good is a normal good for Ï M most, if not all, of those who consume it, an increase in market income, M, would shift the demand curve for good X outward. However, if it is inferior for enough consumers, increases in Dx’ market income, M, shifts Dx demand inward. A decrease in income works in the Qx opposite way.

Willingness to Pay and Consumer Surplus Generally, as the price of a good increases, the customer responds by buying less; but how much worse off is he because of it? Or, equivalently, how much better off is a consumer when he can buy a product at a lower price? Let’s start with the simplest possible case where a consumer either buys one unit of a good or none. The consumer has a reservation price reflecting the value they place on the good. If the price of the good is above this reservation price, the consumer is not willing to buy the good. If it is at or below this reservation price, the consumer is willing to buy the good. We call this reservation price the consumer’s willingness to pay for the good, denoted v. Often, consumers are able to purchase goods at a price that is below their maximum willingness to pay. For example, imagine a consumer values a certain car at $20,000, but the car is currently listed at a price of $17,000. Because the consumer is able to pay $17,000 for a car that he values at $20,000, he would be $3,000 better off. This dollar measure of how much better off a consumer is because he gets to purchase a good at the prevailing market price is called his consumer surplus. The calculation of consumer surplus is simple when a consumer has a choice between buying either one unit of a good or zero units. When a consumer potentially buys more than one unit, things become more complicated. In this context, consumer surplus represents a very simplified measure of consumer theory. But, this measure is used extensively. To be exact, we would have to relate the measure of consumer welfare to consumer theory more precisely and speak in terms of compensating differentials for price differentials. Instead, we start by letting v(q) denote a consumer’s willingness to pay for q units in total is denoted v(q). Thus v(10) represents the maximum amount the

165

consumer would pay in total for 10 units of the good if his alternative were to purchase zero units. We then assume the consumer behaves as if to maximize the difference between this total willingness to pay for q units and what he actually pays (at price p per unit), which is the consumers surplus, cs(q):

cs (q) = v(q) − pq .

(8.3)

Maximizing consumer surplus with respect to quantity gives

dCS dV = − p = 0 . dq dq

(8.4)

Letting v '(q) denote the derivative of willingness to pay, or marginal willingness to pay, this could be written as

p = v'(q) .

(8.5)

This says a consumer will continue buying as long as their marginal willingness to pay, or their willingness to pay for just a little more, exceeds the cost of a little more. At any point where the value of another unit exceeds (is less than) the cost, there is an incentive to buy more (less). That is, this tells us the price at which the consumer will buy q units. So, it follows from this that marginal willingness is the individual consumer’s inverse demand curve in this model. Let’s consider an example. The first two columns of p(q)= q v(q) cs(q) the table to the right show an inverse demand curve. v’(q) The third column adds up the marginal willingness to 1 7 7 3 pay for each unit to get (total) willingness to pay. The 2 6 13 5 last column shows consumer surplus if price is 4 at 3 5 18 6 each possible quantity. For the first unit, we can see that his maximum willingness to pay is 7; as he only 4 4 22 6 pays 4, he has a surplus of 3. If 2 are purchased, total 5 3 25 5 willingness to pay is 13 (7+6) and consumer surplus is 5 (13‐8). The third unit brings total willingness to pay to 18 and purchasing it brings consumer surplus to 6. The fourth unit increases willingness to pay and total payments in equal amounts, leaving consumer surplus unchanged. As we will discuss more below, this is simply because the price exactly equals marginal willingness to pay at 4 units AND because we are assuming discrete units. If units were both valued and available in fractions, or if the price were 3.50, the fourth unit would increase consumer surplus. Purchasing more than 4 would lower consumer surplus, since the marginal value falls below the market price.

166

The figure to the right shows the example from the table above in graphical form. Since we have discrete units (we did not specify the value for half a unit, or a tenth of a unit), the shaded areas correspond to consumer surplus – the amount willingness to pay exceeds actual payment. Willingness to pay in the figure would include not only the surplus, but also the amount paid.

p 7 6 5 4 v’(q)=p(q)

Now that we know the demand curve tells us the consumer’s marginal willingness to pay for the qth unit, it is clear that v(q), his total willingness to pay for all q units, is the total area under the curve. With a linear demand, this can be calculated as the area of a triangle. For those comfortable with the concept of an integral, it is also the integral of the area under the (inverse) demand curve, p(q), up to the quantity purchased. Letting x be the dummy variable of integration, this is written as q

v(q) =

∫ p(x)dx .

(8.6)

If we subtract the total payment for q goods at a price of p, the expression for consumer surplus is q

cs(q) =

∫ p(x)dx − pq .

(8.7)

This is the area in the graph that’s under the demand curve and above the current market price. Market consumer surplus is nothing more than the summation of all the individual consumer surpluses, or Q

CS (Q) = ∑ csi (qi ) = ∫ p ( x)dx − pQ i

(8.8)

Market consumer surplus can still be found geometrically, as it is simply the area below the market demand curve, and above the market price.

167

Example: Consumer Surplus

Part 1 Suppose a consumer’s inverse demand is given by p = 10 − 0.4q and the current price is $2. How much does the consumer buy and what is their consumer surplus and willingness to pay at that quantity? Solution: The consumer purchases where their marginal willingness to pay equals the market price.

2 = 10 − 0.4q 0.4q = 8 q = 20

The easiest way to find cs(q) is to graph the demand curve and calculate the area under it at the chosen quantity. This is shown in the figure. Consumer surplus is just the area of the indicated triangle. The base of the triangle is 20 and the height is 10‐2=8, so consumer surplus is

cs =

P 10 2

cs d

pq 20

20(8) = 80 . 2

We also observe that the total amount actually paid by the consumers is

pq = 2(20) = 40 so total willingness to pay is

v(q) = 80 + 40 = 120 .

We could have found the same answer using integration. Part 2 Assume there are 100 identical consumers in the market. What is the total willingness to pay of all consumers and market consumer surplus when each consumer maximizes their consumer surplus and price is 2? Solution: The total value of all units sold and consumer surplus are simply 100 times the values per consumer in this case.

168

Chapter 8 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Compensation Indexing A tool used by managers to find how much more or less they must pay an employee for relocating them. It takes into account housing prices in the different region, the cost of everything else, and the employee’s amenity preferences. Compensating Differential The additional amount of income needed by an employee to do an undesirable or dangerous job or work in an undesirable town. Amenity A desirable feature of an area to which an employee is relocating. As amenities for a certain town increase, the extra income a manager must pay an employee for relocating decreases. Consumer Surplus The benefit received by consumers who can buy a product for less than their willingness to pay for it. Approximately, it is the triangular area under the demand curve and above the market price.

169

Chapter 9 Non Linear Pricing

Block Pricing and TwoPart Pricing In the last chapter, we developed the concepts of (total) willingness to pay and consumer surplus. In this section, we apply those ideas to study more advanced pricing strategies. Consider the standard way a firm with market power maximizes profit. Assume, for purposes of the p current discussion, that they are selling to a single customer, who is a price taker. cs The figure to the right is the standard p mon depiction. The monopolist sets MR = MC DWL and sells the profit‐maximizing quantity MC qMON by charging a price of pMON, The area p(q) above cost and below price is the firm’s MR profit (π); the area above price and q q* qmon below the demand curve is consumer surplus (cs).

What does the triangle labeled DWL represent? Note that units to the right of qMON but to the left of q* have a higher value to the consumer than the marginal cost of producing them (that is, p(q) > MC) and aren’t being sold. Put differently, total value added by the firm could be increased by producing and selling to the point where the marginal willingness to pay, or marginal value of another unit, is equal to marginal cost. However, to do so, the firm would have to lower the price charged per unit to marginal cost. This would reduce profits, since price is less than marginal revenue for a firm with market power. In the figure, with a constant per unit cost, this would eliminate profits altogether. The fact that the firm’s profit‐maximizing output falls short of the level that would maximize value added results in a deadweight loss (DWL). So, from the firm’s point of view, there are two related problems with the picture above. First, there is value added left to be had by increasing sales to q*, but they can’t get any of it because they would have to lower price. Second, consumers get to keep some of the value added by the firm in the form of consumer surplus. Block and two‐part pricing attempt to rectify some these deficiencies that are inherent in simple linear pricing from the firm’s point of view. The only way the firm can get rid of the deadweight loss is to sell every unit that has a higher value to consumers than its cost to produce; in other words, it must sell where marginal willingness to pay equals marginal cost. But, with simple linear pricing, lowering price below the monopoly level lowers profit. The only way to take away consumer surplus is to charge them more, but it the linear price is higher, they buy less and profit is lower. The question is – how to sell more while simultaneously

170

charging the customer enough in total to capture not only the new value added but also leave them with little consumer surplus, instead taking it as profit? Simple linear pricing is simply not up to that challenge. Using block pricing, the firm “bundles” together q* units of its goods and charges a single price P for the entire bundle, or package. Since it is selling q* units as a bundle, and charging a single price for the entire package, intuition tells us that it should charge a consumer his total willingness to pay for q* units. This is p depicted in the figure to the right. By charging one price for the block of units (for example, a 24‐pack of coke), the firm is capturing a customer’s total willingness to pay in the form of profit, leaving no value added in the form of consumer surplus.

P=v(q*) q*

p(q) q

If there are n identical customers who each buy one bundle for price P, profit is

π = nP − C (nq*)

where nq* is the total amount of individual units (i.e. single cans of coke) produced. This is shown in the figure to the right. The portion of the consumer’s willingness to pay that does not cover costs is left as profit. Since value added is maximized, this is as large a profit as the firm could possibly make from one customer.

(9.1)

π p*

C(nq*)

MC p(q) q

q* Now let’s look at the math behind block pricing, which we generalize to allow for both non‐constant marginal cost and multiple (n) identical customers. Each customer still buys a single bundle of q units at a price of P for the whole bundle. The firm’s total cost is C(nq). The firm wishes to maximize profit, which is total payment less cost, or

π = nP − C (nq) ,

(9.2)

subject to the constraint that the total charge for a bundle cannot exceed a customer’s willingness to pay, or

P ≤ v(q) . Since the firm wants to capture as much value added as possible, it would not set the bundle price less than the consumer’s maximum willingness to pay, since that would leave some surplus for the consumer. Thus, it wants the two to equal each other, which means we can substitute v(q) for P into our original profit function:

171

π = nv(q) − C (nq) .

(9.3)

Maximizing profit, we get

dπ dV dC d (nq) =n − dq dq d (nq) dq Note that dV/dq is marginal willingness to pay, or p(q); dc/d(nq) is the rate at which cost changes for a one unit change in total quantity, or marginal cost; and d(nq)/dq is the rate at which total quantity changes for a one unit change in the amount per bundle, or n (since increasing each bundle by one unit will result in n more total units being produced). We can now rewrite our derivative as

dπ = np(q) − nMC = 0 , or dq

(9.4)

p(q) = MC .

(9.5)

Notice that this is the same conclusion we saw graphically; that is, we want to produce where marginal willingness to pay equals marginal cost. Solving this gives the optimal bundle size q*, which can be used to find the price of the bundle P:

P = v(q*) .

(9.6)

Twopart pricing accomplishes the same thing in a different way. In two‐part pricing, there is a price per unit p, and a fixed fee f that is paid for the right to purchase the goods. An example of a firm that uses this method is Sam’s Club, where a certain fee gets you monthly or annual membership to access the goods within the store at a low per‐unit price. Notice that this pricing strategy is infeasible if resale is possible, since customers could make a profit by buying a membership, paying for goods at the low price, and reselling them on the market for a slightly higher price, but saving other consumers the membership fee. p The idea is to choose the per unit price so that consumers will choose the quantity that maximizes value added, and then to charge a membership fee high enough to capture all f = cs(q*) consumer surplus in the form of profit. In p MC order to sell q* units, the firm must set price equal to marginal cost. Then, the consumer p(q) surplus that remains is captured as profit by q* q setting the fixed fee equal to that amount, as shown in the figure when dealing with one consumer.

172

Now let’s look at the math behind two‐part pricing, which we generalize to allow for both non‐constant marginal cost and multiple identical customers. Since there are n consumers who each buy q units at a price of p per unit, and also pay a fixed fee f, our profit function is

π = n( f + pq) − C (nq) subject to the two constraints 1. f + pq ≤ v(q) which says the total price the customer pays must not be greater than their total willingness to pay for q units, and 2. p(q) = p or V '(q) = p which are two equivalent ways of saying that the consumer will maximize their individual surplus. In other words, customers will buy a certain quantity at a certain price based on their own marginal willingness to pay – the firm can not dictate both price and quantity independently. Substituting the first constraint into our profit function gives

π = nv(q) − C (nq) , which is in the same form that it took with block pricing example. Therefore, we know the solution will be the same when it is maximized, namely, quantity q* at which p = MC . We can then plug q* into the constraints to find f and p as

p* = MC (q*)

(9.7)

f = v(q*) − p * q * .

(9.8)

173

and

Example: Block and Two‐Part Pricing

Suppose there are n identical customers with willingness to pay v(q) = 10q − 0.25q 2 . If the firm has a constant marginal cost of $2, find profits using simple linear pricing, the bundle price and profits using block pricing, and the fixed fee and profits using two‐part pricing. Solution: With simple linear pricing, the firm will set marginal revenue equal to marginal cost:

With block pricing, the price of the bundle is equal to the total willingness to pay for 16 units:

p = 10 − 0.5q

P = V (16) = 10(16) − 0.25(162 ) P = 96

π = n(10 − 0.5q)q − 2nq dπ dq

= n(10 − q − 2) = 0

π = n ( 96 − 2(16) ) = n64

q=8 p=6

With two‐part pricing, the per‐unit price is equal to marginal cost, and the fee is equal to the leftover consumer surplus:

π = n(6 − 2)8 = n32 For both block and two‐part pricing, the firm needs to sell the quantity q* that maximizes total surplus. Total surplus is value minus cost:

p=2 f = cs = v(q) − pq = 96 − 2(16) = 64

π = n64

TS = n(10q − 0.25q 2 ) − 2qn dTS = n(10 − 0.5q − 2) = 0 dq q = 16

One of the main assumptions for both of these pricing models was that the customers were identical. Typically, this is not the case, and different customers will have different valuations of a firm’s products. When this is the case, a single fixed fee or bundle will not capture as much consumer surplus as when all the customers were identical. If the firm is able to segment its customers into separate groups based on their willingness to pay, it could charge them each a separate “membership” fee and extract the maximum amount of surplus from each customer group. For example, when discounted membership fees are offered to seniors at a golf club, this is exactly what they are doing – charging them less based on their willingness to pay. It is not always possible to separate different customer groups by their willingness to pay, or if it is, it may not be feasible to charge them accordingly. If a

174

firm cannot explicitly separate its customers, it has a few options. The first is to continue using two‐part pricing, but changing the pricing mechanism slightly. When doing this, there are two tradeoffs the firm must balance: i) as the firm increases the fixed fee, in order to capture more consumer surplus from the consumers with the higher willingness to pay, it may cut out some of the consumers with the lower willingness to pay completely from the market; ii) as the firm increases the per‐unit price, in order to capture more revenue, it will lose sales per customer, but customers won’t drop out of the market completely. This flexibility of two‐part pricing means the firm has options in the presence of customers that aren’t identical. The second option the firm has is to use “menu pricing” to get customers to segment themselves. By constructing different bundles that have specific quantities and prices based on different customer groups’ willingness to pay, the firm can effectively charge different customers different prices, thereby increasing its overall profits. This topic is the subject of the next section.

Menu Pricing In block and 2‐part pricing, we assumed we had identical customers. If we had multiple customer types, but still had complete knowledge about their willingness to pay, we would be able to segment the market and just have discriminating block or 2‐part prices. We talked briefly about how a two‐part price might be determined when there are multiple consumer types, but, the seller cannot identify them explicitly. In other words, what if there is asymmetric information about customer demand – the customer knows their individual demand type and the firm does not. This is a form of adverse selection – the customer knows their type but the firm does not. As we briefly discussed when considering block pricing, it is possible to set a single block price, perhaps based on some sort of “average” customer’s willingness to pay. But, this will cause customers that have “less than average” willingness to pay to reject the bundle entirely. Another approach is to use 2‐part pricing, but, now we have to consider “small” value customers and “high” value customers. When choosing a single fee and a single per‐unit price for multiple customer types, there is a trade‐off between lowering the fixed fee and raising the per‐unit price in order to gain some small value customers, and raising the fixed fee and lowering the per‐unit price in order to capture more surplus from high value customers. Intuitively, the ideal is when these effects essentially offset each other, but, we did not derive the precise solution. Two‐ part pricing is this more flexible, and so more profitable, than block pricing in this setting. Both of the above methods are ways of coping with multiple customer types using a single pricing mechanism; however, with a more sophisticated pricing system – menu pricing ‐ it is possible to do better. Menu pricing attempts to deal

175

with the problem of asymmetric information about customer types by offering multiple bundles at different prices, with a different bundle and price designed for each type of customer. There is no way to force a customer to buy the intended bundle, due to incomplete information (i.e. there is no potential for 1st degree price discrimination), so, the firm must design the bundles so that the customer’s will voluntarily make the choice the firm intended. With menu pricing, if there are two consumer types, high value customers and low value customers, but the firm can’t tell them apart, the firm offers two bundles. The first bundle is targeted towards big value customers, and thus has a high quantity q, a high bundle price P, and a low price per unit p, where p = P/q. This should make intuitive sense – a customer that values a product a lot will, for a given price, want to buy a lot of the product. Thus, this customer will be more attracted to a bundle that has a large quantity of the product at a low price per unit. The second bundle is targeted towards small value customers; these customers aren’t concerned with buying copious amounts of the product, and so will buy a bundle with a low quantity q, a low bundle price P, and a high price per unit p = P/q. The reason they are willing to pay a higher price per unit is because they aren’t buying that many units to begin with, and as such will avoid the higher total price of the first bundle that’s intended for the big customers. This reveals a subtle but important point as to why profits won’t be as high with menu pricing as they would be if a firm had complete knowledge about the market, and was able to engage in perfect price discrimination – charging each customer their maximum willingness to pay. Since the big value customers have a higher willingness to pay than the small value customers, regardless of how the small bundle is designed, the big customers will always value it higher than the small customers; in other words, as long as the small customers are willing to buy the small bundle, the big customers will always get some surplus from buying the small bundle. When designing the big bundle, the firm would like to charge a bundle price P that is equal to the big customers’ total consumer surplus, just as we showed when we introduced block pricing – to get it all as profit. But, if the firm did this, leaving the big customers zero surplus from buying the big bundle, the big customers would simply buy the small bundle, since they will always get some surplus from that. So, when designing the big bundle, the firm must allow the big customers to retain some of their consumer surplus to ensure that they will buy it. This is why it is not as profitable as if the firm could discriminate and only offer the big bundle to the big customers. Now let’s define terms with regard to the menu pricing problem. nH: the number of high demand customers, whose value is VH(qH) each nL: the number of low demand customers, whose value is VL(qL) each Total Cost: C(nLqL + nHqH), where nLqL + nHqH is the total quantity sold Two bundles, one big (PH, qH) and one small (PL, qH) Using this notation, profit is

176

π = nL PL + nH PH − C (nL qL + nH qH )

subject to the following constraints. (1) PL ≤ VL (qL ) : You cannot charge the low type more than their total willingness to pay, or they won’t buy (2) PH ≤ VH (q H ) : You cannot charge the high type more than their total willingness to pay, or they won’t buy (3) VL (qL ) − PL ≥ VL (q H ) − PH : The low value customers must get at least as much consumer surplus from their bundle as they do from the big bundle (4) VH (q H ) − PH ≥ VH (qL ) − PL : The high value customers must get at least as much consumer surplus from their bundle as they do from the small bundle Constraints (1) and (2) are called participation constraints because customers will not participate if you violate them. Based on what was said earlier, we know we won’t be able to ever charge the high value customers their total willingness to pay, since they could just buy the low value bundle and retain surplus; so, the second constraint is non‐binding, in the sense that it will be automatically satisfied given the setup of the rest of the problem. The first constraint, though, will “bind.” That is because we can make more profit from charging the small customer a higher package price as long as they buy. So, we will want to charge them the highest price possible, equal to their total willingness to pay. Any higher and they would not buy. So, constraint (1) actually constrains our choice, and holds with equality, while constraint (2) does not actually constrain our choice. Constraints (3) and (4) are called selection constraints because customers will select the “wrong” bundle if you violate them; they are also sometimes called incentive constraints. We know V(q) – P is consumer surplus, so the above constraints can be rewritten as 3’. CSL(qL) > CSL(qH) 4’. CSH(qH) > CSH(qL). Since we never need to worry about the low value customers buying the big bundle, constraint (3) will never be violated, and, we can ignore it. We would like to charge the big customer their whole willingness to pay, leaving them no surplus. But, as we’ve already discussed, trying to do so would cause the high value customers to buy the small bundle. So, constraint (4) is binding, which means we will be tempted to violate it, so it will hold with equality and we cannot ignore it or we would violate it. Since we’ve narrowed these four constraints down to two that bind (1 and 4) let’s look at these two again. Constraint (1) says that the price of the small bundle should be less than or equal to the small value customers’ total willingness to pay. Is there any reason the firm should charge them a lower price than their total

177

willingness to pay? Since they don’t have any other option (i.e. they won’t consider the big package), no. Thus, the firm should set (total package) price equal to their total willingness to pay, or PL = VL(qL). (Another reason equality should hold is that the lower the price is on the small bundle, the more consumer surplus a high value customer will keep if they bought the small bundle, so the more likely it is that the high value customer will buy the small bundle). Constraint (4) says high value customers should get at least as much consumer surplus from the big bundle as they should from the small bundle. Is there any reason they should get more? Remember, the objective is to incentivize the high value customer to buy the big bundle; but, giving them significantly more consumer surplus for buying the big bundle does nothing but reduce profits for the firm. Thus, equality also holds for the fourth constraint, or VH(qH) – PH = VH(qL) – PL. Constraint (1) implies PL = VL(qL) and constraint (4) implies VH(qH) – PH = VH(qL) – PL. We can rearrange constraint (4) and, solving for PH we get

PH = VH(qH) – VH(qL) + PL.

This says we can set the price for the big package PH equal to the price of the small package PL plus any extra value the high value customers gets from buying the big package over the small package VH(qH) – VH(qL). Substituting the first constraint into the rearranged second constraint we get

PH = VH(qH) – VH(qL) + VL(qL).

Substituting PL (the first constraint) and PH (the second constraint, in its final form) into our original profit function, we get

π = n LVL (qL ) + n H (VH (q H ) − VH (qL ) + VL (qL )) − C(n L qL + n H q H ) .

Remember, qH and qL are the sizes of the big and small bundle, respectively – which the firm chooses. Maximizing with respect to qH gives:

∂π dV dC = nH H − nH = 0 . ∂ qH dqH dQ

We can factor out nH, and, recognizing that dV/dq is price and dC/dQ is MC, we obtain

pH (qH ) = MC .

We know that when marginal willingness to pay equals marginal cost, we are maximizing value added. This is shown in the figure below, where the quantity that maximizes value added for the big demander is qH*. So, the result says for high value customers, we should sell the quantity that maximizes value added. Maximizing with respect to qL we obtain

⎛ dV ⎞ ∂π dV dV = n H ⎜− H (qL ) + L (qL )⎟ + nL L − nL MC = 0 . ∂qL dq dqL ⎝ dq ⎠

178

Because dVH/dq(qL) is pH(qL), or the high value customers’ willingness to pay for the low quantity, dVL/dq(qL) is pL, or the low value customers’ willingness to pay for the low quantity, and dVL/dqL is also pL, we have

nL (pL (qL ) − MC) = n H (p H (qL ) − pL (qL ))

and dividing by nL we obtain

pL (qL ) − MC =

nH (pH (qL ) − pL (qL )). nL

Consider the right side of the equation. We know that for a given quantity, high value customers will pay more than low value customers; so, pH(qL) – pL(qL) must be positive. Since the right side is positive, it follows that pL(qL) > MC. Look at a graph of the inverse demand curves for each customer type to the right, where we have assumed marginal cost is constant to keep the graph as clear as possible. From our first partial derivative, we found pH(qH)=MC; so, the quantity in the big bundle will be qH*, and this maximizes value added. The MC quantity that maximizes value added for the pL(qL) pH(qH) small bundle is qLe (e for socially efficient) since q that is where pL(qL)=MC; but, in our second qL* qLe qH* partial derivative we found that pL(qL)>MC. So the actual quantity of the small bundle, qL*, must be somewhere where the price is higher than the cost; in other words, to the left of qLe. The intuitive reason as to why we don’t maximize value added with the small bundle is because we are unable to separate our customers. If we were able to separate them and charge each customer type by their willingness to pay, we’d simply maximize value added, and capture the entire consumer surplus using either block or 2‐part pricing. In menu pricing, we cannot separate our customer types, so the small bundle quantity doesn’t maximize value added.

179

Example Type 1 customers have a willingness to pay of V1 = 10q1 – q12/2 and type 2 customers have a willingness to pay of V2 = 15q2 – q22/2, nL=10 and nH=5, and cost function is C(n1q1 + n2q2) = 2(n1 + n2)q. (Type 2 customers are “high value” since for any given quantity they have a higher willingness to pay.) There are two binding constraints. The first is that the price of the small bundle must equal the low value customers’ willingness to pay, or 2

V1 = 10q1 −

= P1 . 2 Second, the high value customers must get the same amount of consumer surplus (or slightly more) from buying their bundle as from buying the small bundle, or

V2 (q2 ) − P2 = V2 (q1 ) − P1 − P2 =15q1 −

q12

P2 = 15q2 −

q22

q12

P2 = 15q2 −

q22

15q2 −

q22

− P1 . 2 2 Substituting the first constraint in for P1 in the second constraint and solving for P2 we obtain

−15q1 +

+ 10q1 −

q12

− 5q1 . 2 Since we have an expression for the price of each bundle as a function only of the quantities, we can now set up a profit function.

2 2 ⎛ ⎞ ⎛ ⎞ π = 10⎜10q1 − q1 ⎟ + 5⎜15q2 − q2 − 5q1 ⎟ − 2(10q1 + 5q2 ) . 2⎠ ⎝ 2 ⎝ ⎠

To find the quantities of the two bundles, take the partial derivatives. We’ll start with q2 since it only occurs in 3 places:

∂π = 5(15 − q2 ) − 2(5) = 0 ∂q2

15 − q2 = 2 q2 =13

Now, take the derivative with respect to q1.

∂π = 10(10 − q1) + 5(−5) −10(2) = 0 ∂q1

100 −10q1 − 25 − 20 = 0

10q1 = 55

q1 = 5.5

180

To find the prices of the bundles, plug the quantities into the equations P1 and P2. Our earlier conclusion was that the quantity in the small bundle would be less than the quantity that maximizes value added. We can check that by setting the low value customers’ marginal willingness to pay equal to the marginal cost:

dV1 = 10 − q1 = 2 = MC dq1

q1 = 8

Since 8 > 5.5, this fits. The following graph illustrates the values in this example. In essence, we are restricting the quantity of the small bundle to make it less likely a high value customer will be tempted to buy it. This keeps the high value customers “honest,” in that they will purchase the bundle that was designed for them – the big one.

MC pL(qL) 5.5

pH(qH)

To intuitively wrap up what’s going on with menu pricing, let’s try to see the concepts using a graph. We know the quantities that maximize value added, qHE for the big bundle and qLE for the small bundle, occur where marginal value equals marginal cost. If we let the small bundle have quantity qLE, the low value customers’ total willingness to pay, which is the price of the small bundle (PL), is shown in the left figure below. What is profit from the small demanders? We’re getting the entire area VL(qLE) as revenue, and our (variable) costs are simply MC×qLE, so profit is the triangle labeled “π” in the next figure.

VL(qLE)

pL(q)

qLE

qHE

MC pL(q)

pH(q)

qLE

qHE

pH(q)

Suppose we go through the same process for the big bundle, and set the price of the big bundle PH as the high value customers’ total willingness to pay for qHE units. Now, when a high value customer buys the big bundle, he gets no consumer surplus (since the price is exactly his total willingness to pay); however, if he buys the small bundle, his total willingness to pay is shown in the figure on the left. Since the price of the small bundle is only the low value customers’ total willingness to pay [shown in an earlier graph as VL(qLE)], the high value customers’ consumer surplus from buying the small bundle is shown in the next figure.

181

CSH

VH(qLE) pL(q)

qLE

pH(q)

pL(q)

qHE

qLE

pH(q)

qHE

Since the high value customers get this amount of consumer surplus from buying the small bundle, but none from buying the big bundle if its package price were to equal their entire willingness to pay, they would never buy the big bundle. Thus, we cannot set the price of the big bundle as the high value customers’ total willingness to pay. In order for the high value customers to buy the big bundle, they must receive at least the same amount of consumer surplus as they would if they bought the small bundle. Their total willingness to pay for qHE units is shown in the left figure below. The price we can charge them is this area, minus the consumer surplus they would get from buying the small bundle, as shown in the right figure below.

CSH

VH(qHE)

MC pL(q)

qLE

qHE

pH(q)

qLE

pL(q)

pH(q)

qHE

Our profit from the big bundle, then, is just the price (PH) minus the cost (everything that falls below MC). This is shown in the left panel of the next figure.

+∆πH

CSH’

CSH

‐∆πL

C(qHE) qL

pL(q)

MC ∆qL

pH(q)

pL(q)

qLE

qHE

pH(q)

Our earlier claim was that this is not the way to maximize profit. We stated that restricting the size of the small bundle would actually increase our overall profit. So, let’s look at our profit if the size of the small bundle is less than qLE. Lowering qLE lowers the amount of surplus we have to let the high demand customer retain. In the figure on the right above, we’ve lowered the quantity of the small bundle by ∆qL.

182

Since we can only charge the low value customers their total willingness to pay, and we’ve shrunk the size of the bundle, we can no longer capture the gray triangle as profit. Therefore, the triangle is lost profit from our low type customers (labeled ‐ ∆πL in the graph). Since we’ve shrunk the size of the small bundle, the consumer surplus the high value customers would get if they bought the small bundle shrinks to CSH’, which means the consumer surplus we must leave them when pricing the big bundle shrinks to that value as well. Thus, we can charge more for the big bundle, particularly by the amount labeled ∆πH (the loss of the high value customers’ consumer surplus from buying the new smaller bundle). Now lets assume there is just one of each type of consumer (for now, just to make the graphical analysis simple). Since the increase in profits from the high value customers is bigger than the decrease in profits from the low value customers (∆πH > ∆πL), total profit is greater. So, now we know the size of the bundle for the small demander is less than qLE, but how much? The first thing is to notice that in the previous graph, the change in profits ∆π for either customer type is at the margin; in other words, it’s the difference in profits for lowering the quantity of the small bundle by a very small amount. In the graph, the change in quantity ∆qL is bigger to illustrate the point, but it’s important to understand that the theory applies for minute changes in quantity. Since we know we want to keep lowering qL as long as ∆πH > ∆πL, it follows that we want to keep lowering the quantity until these two are equal at ∆πH the margin; in other words, until the profit we lose from the low value ‐∆πL customers is exactly equal to the MC profit we gain from the high value pL(q) pH(q) customers for a marginal (tiny) q change in qL. This is shown in the qHE qL* qLE figure to the right. The optimal quantity for the small bundle is qL*. Remember, we are assuming there is one customer of each type (or, that there are equal numbers of each type). If this were not the case, you would have to take that into account when deciding how much profit to take away from the low value customers. If we think back to our first order condition that we found when first deriving our profit function, we had

pL (qL ) − MC =

nH (pH (qL ) − pL (qL )). nL

Looking at the above graph, we can connect our conclusions to this mathematical equation. The loss in profit from low value customers if we decrease qL slightly, ∆πL, is the difference between their marginal willingness to pay and marginal cost, or pL (qL ) − MC , which is just the left‐hand side of the equation. The gain in profit from high value customers from decreasing qL slightly, ∆πH, is the difference between

183

their marginal willingness to pay and the low value customers’ marginal willingness to pay, or p H (qL ) − pL (qL ) , which is just the right‐hand side of the equation when nH = nL. So, we see that the graph relates to our first derivative of profit. Furthermore, since the equation has nH/nL on the right‐hand side, we can analyze what would happen if the ratio of high to low value demanders were not equal to 1. If the number of high value customers increases, then the ratio nH/nL will increase, which means the profit we take we gain from high value customers (on the margin) is more important relative to the profit we lose from low value customers. Since we gain profit from high value customers by lowering the quantity of the small bundle, the quantity in the small bundle will be less (compared to qL* in the figure). If the number of high value customers decreases, then the ratio nH/nL will decrease, which means the profit we lose from the low value customers (on the margin) is more important relative to the profit we gain from the high value customers. Since we lose profit from low value customers by lowering the quantity of the small bundle, the quantity in the small bundle will be greater (compared to qL* in the figure). The way we’ve introduced menu pricing has been through offering different size packages at different package prices; essentially, multiple customer block pricing. You can also use 2‐part pricing in a similar way, by offering multiple combinations of fees and per unit prices (and, often, other sorts of benefits as well). Customers then choose which kind of “membership” they want to have. Using the same principles from bundle pricing, we know the quantity we want the low value customers to buy will be less than the efficient quantity, qL* in the figure. To get them to buy that amount, the price must f be pL in the figure. Then, we set their fixed pL L fee as their consumer surplus (the triangle MC labeled fL). We still want to sell the efficient quantity to the high value pH pL E q customers (qH ). The price that gets them qHE qL* qLE to buy that quantity is pH, which is the marginal cost.

184

The consumer surplus the high value customers will get if they buy the small membership and pay the low fee is labeled CSH in the first graph below. Since they could get this amount of surplus by choosing the option intended for the low demand type, they must keep this amount of surplus if they choose the option intended for them, or, they will not choose it. Therefore, the high fee will be their total consumer surplus when price equals marginal cost, less that amount of surplus. This is shown as fH in the second figure below. So instead of offering two different size bundles for two different prices, we’re now offering two different memberships, one with fee fL that gets you prices of pL, and another with fee fH that gets you prices of pH, where fL < fH but pL > pH. CSH

CSH f pL L

MC pL

qL*

qHE

MC pL

qL*

qHE

An important point to take away is that we always want the high value customers to buy the quantity that maximizes value added, i.e. the socially efficient quantity. This problem is analogous to a number of adverse selection problems in other areas of economics, such as choosing income tax rates. If the government has a certain amount of revenue it wants to raise in the form of tax, and workers can be classified as “high productive” workers and “low productive” workers, the government will want to get most of its tax revenue from the “high productive” workers, as they provide a greater share of the taxable income (just as we wanted to take most of our profits from the high value customers, as they have the highest willingness to pay). The problem is, increasing the marginal tax rate on the “high productive” workers will cause them to want to act like “low productive” workers, so they can escape the higher tax rates (just as the high value customers in our problem were tempted to buy the small bundle in order to get more consumer surplus). The conclusion is that you want the marginal tax rate on the “most productive” worker to be 0, in order to incentivize that worker to continue being productive (take on another project, etc.) and not to be less productive; in our example, it’s that you want the high value customers to buy a large quantity, and not to be tempted to buy a small quantity. That does not mean you want their TOTAL tax payments to be low, or even their AVERAGE tax rate to be low. Indeed, they will pay the highest total tax and quite possibly the highest share of their incomes as tax. It just means that, at the margin, since they are the ones that can earn the most and therefore pay the most in taxes, you want to avoid giving them the incentive to produce less – which would mean the tax burden would have to be higher on others with less ability to pay to raise the same total revenue.

185

Chapter 9 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Asymmetric Information A state in which one party knows more than others. Adverse Selection A case of information asymmetry in which one party’s characteristics are hidden from another party. Consumer Surplus The benefit received by consumers who can buy a product for less than their willingness to pay for it. Approximately, it is the triangular area under the demand curve and above the market price. Block Pricing A way of avoiding deadweight loss by setting the price and quantity where consumer surplus is maximized (where marginal willingness to pay equals marginal cost), then extracting all of the surplus as profit by bundling the units and setting a single price for that bundle. The bundle price is equal to the consumer surplus plus the total cost. 2Part Pricing A way of avoiding deadweight loss by setting the price and quantity where consumer surplus is maximized (where marginal willingness to pay equals marginal cost), then extracting all of the surplus as profit by charging a fixed fee that grants the customer the right to purchase the goods. This fee is equal to the consumer surplus. Menu Pricing A way for a firm to maximize profits when there are different types of customers and the firm is unable to identify and separate them into groups. The firm must deal with this asymmetric information about customer types by offering multiple bundles at different prices or offering different fixed membership fees, higher fees allowing customers to buy a good for a lower per‐unit cost. Participation Constraint Constraint that must hold true in order for a party to participate. In the case of Menu Pricing, the surplus the customer would need to participate must be at least as much as the surplus he would receive from not participating. In this case, the participation constraints would be that the price for the low type consumer must be less than or equal to his willingness to pay, and the price for the high type consumer must be less than or equal to his wiliness to pay. Incentive/Selection Constraint Constraint that must hold true in order for a party to act a certain way or buy a certain membership or bundle. In the case of Menu Pricing, the incentive constraints would be that the surplus received by the high and low type consumers for buying the package or paying the fee meant for them must be at least as much as the surplus they would receive from buying the other package or paying the other fee.

186

Chapter 10 Uncertainty with Risk Aversion When discussing firms making decisions under uncertainty, we assumed they behaved in a risk‐neutral manner. This was both because one decision made by one firm represents a very small portion of a well‐diversified portfolio, and because it is the simplest way of dealing with the consequences of uncertainty for decisions when the impact of the degree of risk aversion is not, in its own right, particularly important to the decision. This isn’t the case with individuals, though, because as single holders of the gambles that we experience (chance of sickness, bonus packages, etc.), we bear the full amount of the risk. Therefore, we generally assume that individuals are risk‐ averse.

Expected Utility Suppose there are different possible outcomes for any given scenario, and each outcome has an associated wealth level. Let wi denote the wealth level in outcome i, where i goes from 1 to n possible outcomes, and let fi be the probability of the ith outcome. Given this information, the expected utility theorem says there exists a utility function u(wi) such that the option with the highest expected utility,

E (u ) = f1u ( w1 ) + f 2u ( w2 ) + f 3u ( w3 ) + ... ,

or just

EU = ∑ i fi u ( wi ) ,

is chosen.

We’ll expound on what this theorem is really saying, but it’s important to understand that this is just a model. The point of the model is not to provide a detailed explanation for how everybody will act all the time; instead, think of it when looking at the market as a whole. We’re assuming events in the market often play themselves out as if consumers, when making decisions, act in this way. The four major assumptions we had for consumer theory (completeness, more is better, transitivity, and preference for variety) also apply to this model. In addition, there is one more important assumption: 1. Independence Axiom: If A B , then compound lottery A with probability f and C with probability (1‐f) compound lottery B with probability f and C with probability (1‐f). To explain the independence axiom, let’s use an example. Remember, we’re talking about individuals making decisions about respective gambles, so A, B, and C

187

are different gambles that the individual is faced with. Let’s assume that you have a car, and you have to buy car insurance. There is some chance the car will be wrecked and you will face a large repair bill. Gamble A is if you buy minimal insurance, and gamble B is if you buy extensive insurance. Assume that we prefer A to B ( A B ). The theorem says that if we introduce some new gamble C in equal weights to both of the original gambles, it shouldn’t affect our preference of A over B. Suppose C is a given probability you will die before taking delivery of your new car covered by the policy, and, the complimentary probability that you will not. Adding this gamble to both of the other gambles, creating a compound lottery, shouldn’t change the fact that you prefer A over B; so, you will choose A&C over B&C. This is in essence what the theorem is saying. Let’s look at an example. Suppose an individual’s utility function is U = 10 w , and they are faced with a choice between A and B, where

A: f = 12 ,w = $0 and f = 1 2 ,w = $100

B: f = 1,w = $36

For a risk‐neutral firm, the expected value of the gambles is

EVA = .5(0) + .5(100) = 50

EVB = 36

The expected utility of gamble A is

(

) (

)

EUA = .5 10 0 + .5 10 100 = 0 + 50 = 50

and the expected utility of gamble B is

(

)

EUB = 1 10 36 = 10(6) = 60

So, the expected value (or payoff) is higher for gamble A, but the expected utility is higher for gamble B. This is a result of the individual’s risk‐aversion; even though the expected payoff is higher for A, this individual gets a higher utility from B because there’s no risk in the payoff – it’s $36 for sure. Let’s look at another example. The utility function is the same as in the previous example, but now the gambles are C and D, where

C: f = .8,w = $36 and f = .2,w = $0

D: f = .5,w = $64 and f = .5,w = $0

The expected values are

EVC = .8(36) + .2(0) = 28.8

188

EVD = .5(64) + .5(0) = 32

which means a risk‐neutral firm would choose D over C. The expected utilities are

(

) (

)

(

) (

)

EUC = .8 10 36 + .2 10 0 = 8(6) = 48

EUD = .5 10 64 + .5 10 0 = 5(8) = 40

so the individual will prefer C over D. Again, this consumer prefers the gamble with less risk (smaller variation in potential outcomes) and the lower expected payoff. This is not true in general; you have to work out each consumer’s expected utility using his or her individual utility function to find out if the lower variation in risk is worth the lower expected payoff. Now that we know how to calculate expected utilities, we need one more tool in order to convert them back into values of wealth that have a more concrete meaning. The certainty equivalent of a gamble is the certain amount of wealth that gives you the same utility as the gamble gives you. For a risk neutral entity, such as a firm, the certainty equivalent equals the expected value of a gamble (CE = EV). For a risk averse entity, such as an individual, the certainty equivalent is strictly less than the expected value of a gamble (CE < EV). To find the certainty equivalent of a gamble for a risk averse individual, the utility of the certainty equivalent must equal the expected utility of the gamble, or

u (CE ) = ∑ fi u ( wi )

and since the certainty equivalent is an amount of wealth, this is a way to assign a monetary value to a gamble. Let’s look at an example. Suppose we are faced with a gamble where there is a 50% we get a payoff of $100, and a 50% we get a payoff of $0. Assuming our utility function is still u = 10 w , we have

(

) (

)

u (CE ) = .5 10 100 + .5 10 0

10 CE = 50

(plugging CE into our utility function)

CE = 5 CE = 25

So for this individual, they’d be indifferent between the gamble described above, and receiving $25 for sure. Recall that the expected value of the gamble was $50. Someone who was risk neutral would value the gamble at $50.

189

Constructing a Utility Function for Uncertain Outcomes Earlier we introduced the concept of expected utility. But, “expected utility” may still seem like a vague abstraction that is hard to relate to. To make the notion of expected utility more concrete, consider the following scenario. Suppose you are faced with a gamble, where there is some probability f that you receive $100, and some probability 1f that you receive $0. Therefore, the expected utility of this gamble is

E (u ) = fu (100) + (1 − f )u (0) .

The certainty equivalent is the single, certain, amount of wealth an individual receives for certain that provides the same amount of utility as the gamble. In other words, the individual values this amount of wealth as equivalent to the gamble – which is why it is called the certainty equivalent. Notice that a certainty equivalent only exists in relation to a particular gamble; there is no such thing as an overall certainty equivalent for a utility function. The certainty equivalent for our example, then, is given by

u (CE ) = fu (100) + (1 − f )u (0) .

Now that we have a function describing our expected utility of a gamble and our certainty equivalent for that gamble, we can experiment with different probabilities of f to see how they affect these values. Remember, we are completely indifferent between the certainty equivalent and the gamble, since the utility that each provides us is the same. Keeping that in mind, consider the following thought experiment. Consider each of the levels of wealth in the left column of the table below (say in thousands of dollars). Then, ask yourself how high the probability of winning 100 (f) would have to be make you indifferent between the gamble and the certain wealth level. The second column gives the answer to this question for a hypothetical individual, but, you should go through the thought experiment and find your own values, too. If one had only 0 for sure, we would be willing to take the gamble if f were 0; that is, we’d be indifferent between $0 for sure and a gamble with a 0% chance at $100 and a 100% at $0. Similarly, if one could have $100 for sure, one would only be willing to take the gamble if f were 100; if we had any less than a 100% chance of winning $100, we’d keep our original $100, assuming risk aversion.

CE 0 16 25 49 81 100

f 0 .4 .5 .7 .9 1

Now let’s consider a certain wealth of $16. Ask yourself, if you had $16 for sure, how high must the shot at $100 be for you to give that $16 up? This is exactly what this value of f represents. So, for this individual, suppose f would have to be .4. That is, he would be indifferent between $16 for sure, and a gamble with a 40% chance of $100 and a 60% chance of $0. Moving down the table we see the rest of this individual’s probabilities, which are just the values of f where he’d be indifferent between keeping the initial wealth level and taking the gamble.

190

Going through this exercise is equivalent to defining a utility function. If you can assign values of f that would make you just as happy with the gamble as you would be with the initial value of wealth, you have essentially defined your utility function. To find this individual’s utility function, let’s begin by defining the utility of $0 as 0, and the utility of $100 as 100. The units for utility have no intrinsic value, so defining them this way (0 units and 100 units) simply sets the scale in a convenient way. (We will consider the allowable transformations of the utility function in more detail in the next section. But, basically, multiplying by a constant and adding a constant will not change the expectation or the certainty equivalents.) Since the certainty equivalent is

u (CE ) = fu (100) + (1 − f )u (0) ,

plugging in 0 for U(0) and 100 for U(100), gives

u (CE ) = f 100 + (1 − f )0 ,

or just

u (CE ) = f 100 .

This makes it clear that identifying the probabilities for a hypothetical gamble that makes it equivalent to a specified utility IS the same thing as identifying the utility function itself (assuming the individual’s preferences satisfy the standard assumptions, in particular the independence axiom). U(CE) Going back to our table (recreated at right), and CE f =f100 observing each initial wealth value’s respective 0 0 0 probabilities, we can solve for utility by plugging f into 16 .4 40 the above equation. This is shown in the third column. 25 .5 50 Remember, the units of utility are arbitrary; so, as long as 49 .7 70 you can determine the different probabilities for each 81 .9 90 initial wealth value, you can define a utility function. 100 1 100 The graph to the right shows this particular utility function. We’ve illustrated the case where f is .5; that is, where we have a 50% chance at $100. The expected value of this u gamble is $50, and the expected utility turns out to be 50 units (due to the way we scaled 100 our utility function). The table tells us the CE of 50 this gamble is 25, and this is shown on the graph; notice that the utility of the CE is 50, which is also the utility of gamble. That is simply because we chose the scale (which is 25 $100 50 w arbitrary) so that u=100f, which is also the expected value for this example.

191

In the figure to the right, the U(100) 100 horizontal distance between the expected value of the gamble, 50, and the certainty equivalent, 25, is 50 labeled as the risk premium. The risk premium is just the amount an Risk individual is willing to pay to get Premium U(0) 0 rid of the risk they face. Suppose 25 this individual has an initial wealth 50 100 CE EV of 100 but faces a 50% chance of loosing it all. Expected losses are 50 (100 times 0.5) – the expected value of their wealth is only 50. But, the individual would happily give up another 25 to be rid of the risk – the risk premium. Observe that the CE is determined by the curvature of the utility function. If this utility function were more concave, the single wealth amount that gives a utility of 50 (the CE) would be less than 25. Thus, any operation on a utility function that changes its curvature (such as squaring) creates a new utility function that is entirely different from the original, because it will produce different certainty equivalents. To stress the point that scale chosen in the above example ‐ u(0) = 0 and u(100) = 100 ‐ had no impact on the utility function, let’s look at a more general example. Let u(w) be our utility function, and suppose we are faced with a gamble where we get $60 with probability f and $10 with probability (1f). Our expected wealth is just

E(w) = f 60 + (1 − f )10

and expected utility is

u(60) E(u)

E (u ) = fu (60) + (1 − f )u (10) .

u(10) Since the probability f is associated Risk with the payoff of $60, the expected wealth Premium will just be f percent of the way from 10 to 60. If there’s a 50% chance of $60, E(w) w 10 CE E(w) 60 will be halfway from 10 to 60; if there’s a 75% chance of $60, E(w) will be 75% of the way from 10 to 60. The point is to realize that this will be true for expected utility as well. This is why we draw the straight line from [10,u(10)] to [60,u(60)] ‐ because however close E(w) is to 60, that’s how close E(u) will be to u(60). This is shown in the figure above when f is about 2/3. The risk premium is the difference between the certainty equivalent and expected wealth. The greater the risk premium, the more risk averse an individual is. Consider the utility functions in the figures below. In the graph on the left, u2 has a sharper curvature than u1; thus, the risk premium for u2 will be greater, so the individual

192

with the utility function u2 is more risk averse than the individual with the utility function u1. u

u u3 u1 u4 w

In the graph on the right, U3 is a straight line, so the risk premium will be 0 (this individual values gambles at their expected wealth). This individual is risk‐neutral. Since U4 is convex, not concave, the risk premium will actually be negative; in other words, their certainty equivalent will be greater than the expected wealth of the gamble. Because of this, the individual with the utility function U4 is risk loving.

Uniqueness and Scale of the Expected Utility Function Recall that in the last chapter, the units of utility were arbitrary, all that mattered was the ranking of the different consumption bundle. As a result, any increasing transformation of the original transformation represented the same preferences – add to it, multiply it by a constant, take the log, square it, whatever. The units, level, and scale of the expected utility function are also arbitrary. But, any transformation must keep the ranking of the expected utility the same, not just the ranking of the utility of wealth. Also, and equivalently, it must keep the certainty equivalents the same. What that means is that any increasing transformation of the expectation of the utility function, E [u ( w) ] = ∑ i fi u ( wi ) , is fine. For example, in theory taking the

(

)

(

)

natural log, ln E ⎡⎣u ( w ) ⎤⎦ = ln ∑ i fi u ( wi ) , would give the same preferences across gambles, although it is exceedingly hard to conceive of any circumstance where that particular transformation would be at all helpful. Of more interest, though, are transformations of the utility function itself, u ( w ) . We showed above that the certainty equivalents are determined by the curvature of the utility function, so, any transformation that changes the curvature represents different preferences. But, the scale of the utility function, that is the units and level of utility, does not matter. So, u ( w ) and a + bu ( w ) , where a and b are constants with b>0, give the same certainty equivalents and preserve the ranking of expected values. Mathematically, it is relatively easy to demonstrate this is the case:

193

E [ a + bu ( w) ] = ∑ i fi ( a + bu ( wi ) )

= a ∑ i fi + b∑ i f i u ( wi ) .

= a + bE [u ( w) ]

In our numerical examples above, the utility function was u = 10 w0.5 . Adding any constant or multiplying by any positive constant would give the same certainty equivalents. But, if we were to divide by 10 and then square it, we would simply 2 have (10u ) = w , which represents risk neutrality!

The Value of Insurance In the example above, where u=10w0.5, if an individual had an initial wealth of 100 but faced a 50% probability of losing it all, their expected wealth would be 50 and the certainty equivalent 25, leaving the risk premium at 25. That means the individual would accept 25 for sure in exchange for their gamble. A risk neutral firm, on the other hand, would be willing to pay 50 for it. So, of the risk can be reallocated form the risk averse individual to a risk neutral firm, the potential gains from trade are 25, ignoring the costs of facilitating the transaction itself for now. This concept is the basis for insurance markets. If a risk neutral insurance industry serves identical customers with independent risks, the value added by the insurance industry is just the number of consumers times the value of insurance per consumer (the risk premium), less the costs of writing and administering the policies. That is:

Value Added = Number Insured × ( EV - CE - Administrative Costs per Insured ) .

The distribution of that value between insurance firms and customers depends on the market structure. If the firms have market power, they will capture some of it as profit. If the firms are perfectly competitive, the price they charge for a policy will simply reflect the cost of a policy. This will include expected losses plus the costs of writing and administering a policy. That is, in a competitive insurance industry,

Policy Price = Expected Losses + Administrative Costs per Insured.

Consumers are guaranteed a wealth level equal their initial level less the policy price. If a loss is incurred, they are fully compensated. Without insurance, the value of their gamble is the CE. The gain, or surplus, to each customer is

Consumer Surplus = Initial Wealth - Policy Price - CE .

Since initial wealth less expected losses is just expected wealth, this means

Consumer Surplus = EV - CE - Administrative Cost per Insured .

194

In a perfectly competitive insurance industry, consumers capture all value added as consumer surplus. The ability of the insurance industry to add economic value stems from the fact that insurance companies value gambles at their expected value. Since individuals are risk averse, their certainty equivalent is less than the expected value of the gamble, so insurance companies create value based on the difference of the expected value and the certainty equivalent – the risk premium. Why do insurance companies value gambles at their expected value? The reason individuals have certainty equivalents less than the expected wealth level is because they are risk averse – they get less utility in the face of uncertainty. Insurance companies, though, can pool their risk by insuring many clients, which allows them to act as risk‐neutral entities. Consider the following simple example to illustrate this point. Suppose any individual can make $100,000 next year, but there’s a 10% chance they will get sick and lose $90,000, ending up with only $10,000. For the individual, the likelihood of ending up with only $10,000 is devastating, since it will directly affect the amount of money he has for food, rent, etc. In other words, the fact that there is uncertainty present and that the individual may get sick and lose most of his income will be crippling for that individual, since his income is what he depends on. This is what we mean when we say individuals are risk averse. He doesn’t care that the expected wealth of the gamble is .9(100,000) + .1(10,000) = $91,000, because if he gets sick, he will be in trouble. Suppose his certainty equivalent is $80,000. This means he’d be willing to take the $80,000 for sure in order to pass on that risk to someone else (the insurance company). Now, the insurance company has many clients. Suppose each client faces the same gamble as the one described above. If the insurance company only had one client, it would face exactly the same risk as the individual, and would behave in a similar risk averse manner. This is because by only insuring one client, the probability that all of the company’s clients get sick is still 10%, thus costing them $90,000. But say the insurance company has 10,000 clients. Then, due to the sheer amount of clients, on average only 10% of them (1,000 clients) will get sick. Since they have so many clients, it is unlikely that many more or less than 10% will get sick ‐ so the other 90% won’t file claims. In other words, the probability that every client will get sick is much, much lower than when the company had one client. This is why insurance companies can afford to just look at the expected wealth of the gamble. Through risk pooling, they have drastically reduced the chance that they will have to pay out claims to more than 10% of their clients. In this way, insurance companies diversify away their uncertainty through risk pooling. This is why they act as risk neutral entities. In the previous example, the insurance company was able to diversify away the risk by adding more clients because each client’s risk was independent; that is, the probability that one client got sick and filed a claim to the insurance company was completely separate from

195

the probability that another client got sick. This is an important assumption for risk pooling. If the risks were not independent, the diversification would not be effective. Imagine providing windstorm insurance for homes on the coast in Florida. If each home has a 10% chance of getting destroyed by a hurricane, you cannot diversify away that risk by insuring many homes, since a single hurricane will destroy all of the homes. So, a company that provides windstorm insurance must insure separate geographical locations (i.e. independent risks) in order to diversify their risk properly.

Limitations of the Expected Utility Model and Rational Man Models We’ve been talking about the expected utility model and how it allows us to model choices of rational individuals when faced with uncertainty. There are limits on how accurate this model really is. Similarly, there are limits to the accuracy of any rational model of decision making, with or without uncertainty. We discuss three apparent violations of simple rational man models: the endowment effect, the Allais paradox, and the Ellsberg paradox. The endowment effect is the phenomenon where a person values a good more after they have it than before they had it. In other words, a person places a certain amount of value on a good in the market, buys it, and then as a result of now owning it places a higher value on the good. The following is an example of this. Suppose you are in a store looking at shoes, and decide a certain pair is worth $50. You pay $50 for the shoes, and when you leave the store someone approaches you and offers you $51 for the shoes, but you deny the offer. This is the endowment effect; the mere fact that you now own the shoes has made them more valuable to you. An experiment conducted at Cornell illustrates this effect. A classroom was given a survey asking whether they would want a candy bar or a coffee mug, two items that have roughly the same market value. About half of the class wanted a coffee mug, and about half of the class wanted a candy bar. Then, the items were distributed randomly to the class, so that half were give a candy bar and half were given a coffee mug. The idea is that, since the items were distributed randomly, about 25% of the class should have ended up with a coffee mug that wanted one, 25% should have ended up with a candy bar that wanted one, and the other 50% ended up with either a coffee mug or a candy bar but wanted the other. Then, the class was allowed to trade their items freely. It was predicted that the other 50% would want to trade their items, but in fact nobody traded. This, and other experiments of a similar nature, claims to be evidence of the endowment effect. The people who didn’t get what they wanted now valued their items higher simply because they now had them. The problem with this experiment, though, is that it ignores the transaction costs of trading. The transaction cost of a trade is anything other than the actual prices of the goods that it costs you to facilitate the transaction. So, in the example of the

196

Cornell experiment, the transaction cost of one student trading with another would be to get up, go introduce himself, trade the item, and go back to his seat. Also, since the goods are worth relatively little (only a couple of bucks) the gains from the trade are small. Since some students may be shy, or some students may have better things to do with the time they were given to trade, the transaction costs may be too high relative to the small value they would gain by getting the good that they wanted. Thus, for trade to occur, the gains from the trade must outweigh the transaction costs. In lieu of the problem in the Cornell experiment, a researcher by the name of John List conducted a different experiment. List looked at the market for collectible trading cards online, such as baseball cards. The transaction costs for trading online are relatively low, and the stakes for getting certain cards that are worth a lot are high, so this experiment ensures that the gains from trade outweigh the transaction costs of trading. The point of the experiment is to see whether or not this market is efficient. What he found out was that the traders that were beginners didn’t trade when they should have, but the traders that were more experienced were pretty efficient. So, there may have been some evidence of the endowment effect among the novice traders, in the sense that they were nervous of other traders taking advantage of them so they were unable to properly value the cards that they had. But, overall, the market for cards was pretty efficient since there was a group of traders who were experienced and knew how to properly value their cards. This is analogous to the stock market, and the takeaway is that in markets where there is a significant amount of traders who are knowledgeable about the value of the stock they are trading, they will tend to push the market prices toward their efficient level. This is not to say the endowment effect is not a real thing. People are subject to all manner of quirks in their thinking – among them the tendency to focus more on the positive aspects of the decisions they have already made, which leads to the endowment effect. The point is the fact that rational man models are not perfect descriptions of the way we all make everyday decisions does not mean they can shed no light on the workings of markets – they capture the essence of some of the important aspects of decision making. We just need to keep in mind that they don’t explain everything about the way anyone makes any particular decision. Also, a point needs to be made about the difference between the endowment effect and sentimental value. If you have a watch that most people think has a value of $50, but is worth $150 to you because it’s been in your family line for three generations, there is nothing irrational about this. The sentimental value of a good is a perfectly rational reason to value a good more than someone else does. The endowment effect, however, is when someone values a good more than someone else simply because they own it. This is irrational and means that the person isn’t able to properly value his or her own goods. The Allais Paradox is best explained using an example. Suppose there are four gambles A, B, C and D, and each has the following probability distribution with respect to three different prizes of $0, $1,000,000 and $5,000,000:

197

Probabilities

Prize

0.01

0.89

0.9

$1,000,000

0.89

0.11

$5,000,000

0.1

First, let’s choose between gambles A and B. Lottery A is $1M for sure. Lottery B is a 1% chance of $0, an 89% chance of $1M, and a 10% chance of $5M. Typically, an individual will choose gamble A over B, to avoid the 1% chance of getting $0. Now, let’s choose between gambles C and D. Lottery C is an 89% chance of $0 and an 11% chance of $1M. Lottery D is a 90% chance of $0 and a 10% chance of $5M. Typically, an individual will choose gamble D over C, since for an additional 1% chance of $0 they can have a 10% of $5M. To find out why this is paradoxical, let’s work out the expected utilities of each gamble. The expected utilities of A and B are E(U A ) = 0U(0) + 1U(1) + 0U(5)

E(U B ) = 0.01U(0) + .89U(1) + .1U(5)

Then, A is only preferred to B if the expected utility of A is greater than the expected utility of B, or

E(U A ) > E(U B )

0U(0) + 1U(1) + 0U(5) > 01U(0) + .89U(1) + .1U(5)

U(1) > .01U(0) + .89U(1) + .1U(5)

.11U(1) > .01U(0) + .1U(5)

Similarly, D is only preferred to C if

E(U D ) > E (UC )

.9U(0) + 0U(1) + .1U(5) > .89U(0) + .11U(1) + 0U(5)

.9U(0) + .1U(5) > .89U(0) + .11U(1)

.01U(0) + .1U(5) > .11U(1)

198

Since it is impossible for .11U(1) to be both less than and greater than .01U(0) + .1U(5), choosing A over B and choosing D over C is inconsistent. This inconsistency is taken to be a violation of the independence axiom. The independence axiom, in this context, means things that are the same between two options shouldn’t affect the decision. So, looking back at the table of probabilities for each gamble, if you look at A and B, they both have an 89% chance of getting $1M. The real differences between A and B is that A has an extra 11% chance of getting $1M, and B has an extra 1% chance of getting 0 and an extra 10% chance of getting $5M. The 89% chance of getting $1M is common between the two gambles, so by the independence axiom it shouldn’t affect our choice. If we now look at gambles C and D, they have in common an 89% chance of getting $0 – so this shouldn’t affect our choice. Looking at everything else, C has an extra 11% chance of getting $1M, and D has an extra 1% chance of getting 0 and an extra 10% chance of getting $5M, which is exactly the same difference between gambles A and B. This is why if someone chooses A over B, but chooses D over C, they are acting inconsistently with the expected utility model. The Ellsberg Paradox is also best explained using an example. Suppose there is an urn with 60 marbles, where 1/3 of them are green, and the other 2/3 are either orange or blue. Now suppose we are offered the following gambles: I. You are paid $1,000,000 if you draw a green marble II. You are paid $1,000,000 if you draw a blue marble In general, an individual will choose option I over option II. Now suppose there are two more gambles we are offered: III. You are paid $1,000,000 if you draw a green or an orange marble IV. You are paid $1,000,000 if you draw a blue or an orange marble In general, an individual will choose option IV over option III. These two decisions illustrate a paradox. Why? Let’s look at expected utilities of gambles I and II:

E(U I ) = 1 U(1) 3

E(U II ) = Pr(B)U(1)

where Pr(B) is the probability that you draw a blue marble. Remember, we aren’t told how many are blue and how many are orange, so this probability is unknown to us; that is, we have to make our own subjective guess about it. If we prefer I to II, then the expected utility of I must be greater than the expected utility of II, or

E(U I ) > E (U II )

1 U(1) > Pr(B)U(1) 3

199

1 > Pr(B) 3

Basically, preferring I to II means that we think the probability of drawing a blue marble is less than 1/3. Now, let’s look at the expected utilities of gambles III and IV:

[

]

E(U III ) = 1 3 + Pr(Or) U(1)

E(U IV ) = [Pr(B) + Pr(Or)]U(1)

where Pr(Or) is the probability that you draw an orange marble. Now, we will only prefer IV over III if the expected utility of IV is greater than the expected utility if III, or

E(U IV ) > E(U III )

[Pr(B) + Pr(Or)]U(1) > [1 3 + Pr(Or)]U(1)

Pr(B) + Pr(Or) > 1 + Pr(Or) 3

Pr(B) > 1 3

So preferring IV to III means that we think the probability of drawing a blue marble is greater than 1/3. But we’ve already concluded from choosing I to II that we think the probability of drawing a blue marble is less than 1/3. So, we are acting inconsistently in the context of this model. This is referred to as ambiguity aversion, and it basically means that people don’t like gambles where they don’t know the true, objective probabilities of each outcome. Uses of the Expected Utility Model In light of these problems with the model for expected utility, the model still has its uses. Some of the major ones are: 1. To guide individual decisions. If you believe that more is better, that you have a preference for variety, that your decisions should be transitive, etc. then the model can help guide your individual decision‐making processes. When the model was first developed, it was thought that this was to be its primary function; that is, to actually present individuals with their own individual utility function in order to aid them to make logical and consistent decisions. It turns out this hasn’t really been the main application, but it is still possible for individuals to use the model in order to prevent errors in decision‐ making. 2. Description of individual behavior. Given our above discussion of the Allais and Ellsberg paradox, this is untrue. Also, with regard to the Cornell and List experiments, people may act in accordance with this model to some extent;

200

but this model is simply not accurate enough to generalize all individual behavior. 3. Asif model for major decisions and experienced traders. We’ve seen that this model is most accurate when applied to individuals that are both experienced in their market, and when they are dealing with gambles that have high stakes. It’s not necessarily saying that these people use this model explicitly when going through their decision‐making process; only that this model will help analyze and predict the outcomes of experienced players making key decisions. Henceforth we will be using the model as it is described in number three above. We will soon be talking about contracting, where two parties are entering into an agreement that is worth a lot of money, and that both parties know a lot of information about it. In this way, the model of expected utility is applicable to the extent that it helps us predict the outcomes of these big decisions.

201

Chapter 10 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Expected Utility Theorem‐ Theorem that states consumers will make choices based on the utility they expect with an associated payoff, probabilities of those expected payoffs, and risk aversion. Expected Utility‐ The sum of the probability of each outcome multiplied by the utility of each outcome. Independence Axiom‐ An underlying assumption of the expected utility theorem that states if a new lottery is compounded with two original lotteries with a given probability, it shouldn’t affect a consumer’s choice because the difference between those two choices remain unchanged. Certainty Equivalent (CE)‐ The single amount of wealth of a gamble that an individual receives for certain that provides the same amount of utility that the actual gamble offers. For risk neutral players, the certainty equivalent is equal to the expected value of the gamble. For risk averse players, it is less than the expected value of the gamble. Risk Premium‐ The value added by the insurance company by charging risk averse customers their certainty equivalent while the company values the gamble at the expected wealth. Risk Pooling‐ Strategy used by insurance companies that allows them to behave risk neutrally by selling insurance to many customers. Because of the law of large numbers, actual value will approach expected value. Diversification‐ Strategy used by insurance companies that works to reduce risk by serving clients in many locations. This makes the risks independent and can help an insurance company create more value by acting risk neutrally. Endowment Effect‐ A phenomenon that occurs when a person places a certain amount of value on a good before buying it, then, after buying it, places a higher value on that good because he owns it. Allais Paradox‐ An inconsistency in the expected utility model that violates the independence axiom. This occurs when things that are the same between two options affect the decision. Ellsberg Paradox‐ An inconsistency in the expected utility model that occurs when individuals fail to formulate consistent subjective probabilities in the face of ambiguous uncertainty.

202

Chapter 11 More on Production and Cost Recall π = pq – C(q). C(q) here is our cost function. In reality, a cost function is very difficult to predict because everything from engineering to tax policy affects it. In this class, our cost functions represent economic cost, which is the cost of everything you give up for a specific decision, whether or not it’s readily quantifiable. Note economic cost does not always line up with accounting cost. For example, if you have a job with an annual salary of $100,000 and you are contemplating pursuing an MBA, part of the cost of that degree would be the $100,000 a year you would otherwise be making. The equation Cost = ∑ pi x i i

is a general equation for total cost of production, where p is the cost of a certain input (e.g. labor), and x is the amount of that input used. We can always produce a certain quantity by simply buying more inputs, but what we’re interested in is minimizing our cost. Thus, when we refer to a cost function [or C(q)] we mean the minimum possible cost of q units. There are two important assumptions to this idea: 1. No pure waste – your managers are not buying extraneous inputs. 2. Choosing best production process – among the alternatives, choosing the most efficient way to produce q units. The first one is just generally good business practice; the second one requires more attention and will be one of the main subjects of this section. The next thing to define is our production function. This function tells us how many units we can produce based on different amounts of our inputs. The general form is q = f(x1,x2,…,xn) where q depends, or varies, on inputs x1 through xn. The marginal product of input xi is defined as MPi =

∂q ∂x i

which is the partial derivative of the entire production function with respect to the variable xi. This is just the rate of change of quantity with respect to xi. We usually restrict our production function to just two inputs, capital (K), and labor (L), and so for a production function, q = f(K,L)

203

and the marginal product of labor/capital is MPL=

∂q ∂L

MPK=

∂q . ∂K

Note that when looking at the marginal product of labor, we are looking at how much quantity changes for an additional unit of labor. It’s how much product one more unit of labor provides. This is an important concept and will make later material much easier to understand. Input Substitution Looking at a standard job, such as digging a hole for a pool, we could either spend a lot of money on capital (machinery) to do the digging, or we could spend a lot of money on labor (man hours) and let the work force do the digging. Since there are multiple ways of doing it, it’s easy to imagine that there’s some optimal allocation where we are maximizing our inputs’ cost effectiveness. Now let’s define two extreme types of substitutes. 1. Perfect substitutes – these are things that when substituted do not have any effect on output. Red pencils and blue pencils are examples, because they both accomplish exactly the same thing. 2. Perfect complements – there is a very specific ratio of carbonated water to flavoring to make the drink Coca‐Cola. As soon the allocation is changed at all, the end product is no longer the same. So, all of your inputs have fixed proportions. If you are dealing with perfect substitutes, you just buy whichever input is cheapest. With perfect complements, you have no discretion over choice of inputs. In most cases, however, inputs are imperfect substitutes, which fall somewhere between perfect substitutes and perfect complements. This is where the decision‐ making process comes in. It follows that the degree of substitutability, or basically the efficiency of your inputs, comes into play when determining the optimum allocation of your inputs. The marginal rate of technical substitution of input i for input j, or MRTSij, is one way to measure substitutability. In our example, the MRTSLK is how much less capital you could use if you had one more unit of labor, while maintaining the same output level. Suppose MPL = 10 and MPk = 5. If you had one more unit of labor, we can see you would have 10 more units of productivity; thus, you could give up two units of capital, and you’d maintain your output. In this example, MRTSLK = 2. In general, the definition is MRTSLK =

MPL MPK

so the more productive labor is relative to capital, the more capital you could give up for one more unit of labor.

204

Suppose we increase labor and decrease capital to take advantage of this relative productivity advantage. What happens to the marginal products? Well, with the addition of a person to a work force, there are fewer tools per person, so MPL falls (remember, MPL is the increase in productivity for one more unit of labor). What about the tools themselves? There are less of them, so each tool is being put to more use. Thus, the MPK increases. Putting both of these together, it follows that MRTSLK decreases. This should make intuitive sense; as you add more and more labor, the amount of capital that can be freed up for one additional person decreases.

Example: Cost MRTS Suppose our production function is q = 4L0.5K0.5.. Find MRTSLK. Solution: First find the MPL. Taking the partial derivative with respect to L, we see ⎛K ⎞ ∂q = 0.5 * 4 * K 0.5 L0.5−1 = 2 ⎜ ⎟ ⎝L⎠ ∂L 0.5

MPL =

⎛ L ⎞ 0.5 and (similarly) MPK = 2 ⎜ ⎟ ⎝K ⎠

Now looking at the MRTSLK we see

⎛ K ⎞ 0.5 ⎛ K 0.5 ⎞ ⎜ 0.5 ⎟ ⎜ ⎟ K 0.5 K 0.5 K MPL ⎝L⎠ ⎝L ⎠ = = = * = MRTSLK = L0.5 L0.5 L MPK ⎛ L ⎞ 0.5 ⎛ L0.5 ⎞ ⎜ ⎟ ⎜ 0.5 ⎟ ⎝K ⎠ ⎝K ⎠ _________________________________________________________________________________________________ We will now introduce isoquants, which are graphs that represent the substitutability of our inputs. These graphs show different combinations of inputs that produce a single amount of output; that is why they are called isoquants. x In the figure to the right, the straight line represents perfect substitutes. For one less X1, it takes a constant amount of additional X2 to produce the same amount of units. The L shaped line represents perfect complements. No matter how many more units of X1 you have, it has to be combined with X2 in a specific ratio to achieve the end product. The curve that falls in between represents imperfect substitutes, such as labor and capital. There are a few

205

Fixed Proportions Imperfect Substitutes Perfect Substitutes x1

important takeaways from the graph of imperfect substitutes: 1. Slopes down – this is just another way of saying that marginal product is positive. Since marginal product is positive, one more of X1 will contribute to q, and thus we will need some amount less of X2. 2. Bowed in – this is the more important conclusion. You can see as X1 increases a lot, the graph flattens out. This can be interpreted as the more amount of one input you have, the less its marginal product. 3. Isoquants cannot cross. To look more specifically at #2, look at the following graph, which produces q0 units: K 1 (‐slope) = MRTSLK ∆K

∆K

In the top of the graph, 1 more unit of labor causes lots of capital to be freed up, represented by the large ∆K; however, as we add more capital, ∆K becomes smaller. Also, the opposite of the slope of the above graph is just MRTSLK. For the final point about isoquants, look at the following graph:

a c

q1 q2

d L

Comparing points a and b, we notice that point a has more of both capital and labor. Thus, we can conclude that q2 > q1. However, looking at point c and d, we can see that c has more of both labor and capital, and thus q1 > q2. Since both of these cannot be true, we have our third assumption. Now that we have defined isoquants, we can use them to illustrate how we will minimize cost. First it is necessary to introduce our cost curves. These are called isocosts, as they represent one cost. Since we are using only labor and capital, our cost equation is

206

C = wL + rK where w is the wage rate, or the cost of labor, and r is the rate of interest, or the cost of capital (think of it as the rate of interest being charged on the machinery we’re renting, or the amount of interest being charged on a loan we took to buy the machinery). It’s easier to draw if we rearrange it to the following K =

C w − L r r

(solving for k)

⎛C⎞ and since K is on the y‐axis, and the equation is in y=mx+b form, the intercept is ⎜ ⎟ ⎝r⎠ ⎛ w⎞ and the slope is ⎜− ⎟ . Let’s look at a graph with both curves in it: ⎝ r⎠

c2 K r c0 r c1 r

*All three iso‐cost curves have a ⎛ w⎞ slope m = ⎜− ⎟ ⎝ r⎠

q0 L In this illustration there are three (blue) iso‐cost curves. The cost of labor (w) and cost of capital (r) do not change; the only thing that changes is how many of each input we have. The iso‐cost curve C1 doesn’t have enough inputs anywhere to reach the q0 isoquant. The curve C2 reaches it in two places, but there is a better solution. The curve C0 is the best solution, since it reaches the isoquant using the minimum amount of inputs. This is because C0 touches the isoquant in just one place. Thus, to minimize cost, you want an iso‐cost curve that is tangent to your isoquant curve. In the figure to the right, K* is the optimum amount of capital, and L* is the optimum amount of labor. Since the derivative is nothing more than the slope of the tangent, the derivative of the isoquant at L*,K* and the slope of the iso‐cost curve are the same. Thus, when ⎛ w⎞ ‐MRTSLK = ⎜− ⎟ ⎝ r⎠

⎛w⎞ MRTSLK = ⎜ ⎟ ⎝r⎠

L* you are producing at minimum cost. Looking at the definitions of both sides again, it should make sense why this is true. MRTSLK is the rate at which (internally) you can substitute capital

207

⎛w⎞ for one unit of labor, keeping output constant. ⎜ ⎟ is the market rate at which you ⎝r⎠ can give up capital to buy another unit of labor, keeping cost constant. In order to minimize cost, the rate at which you can substitute capital for labor internally has to equal the rate at which the market allows you to do so.

If we write MRTSLK as MPL divided by MPK (which is just its definition), we can rearrange the equation: MPL w MPL MPK = ⇔ = MPK r w r

which says each marginal product divided by its cost should be equal. To further explain, the marginal product of labor divided by its cost is how much productivity you can get for spending $1 on labor. Think of this as “bang per buck.” Thus, when this equation is equal, that last dollar spent on labor provides the exact same productivity as your last dollar spent on capital. Note whatever side of the equality is higher means you should use more of that input, as it is being more productive for the same amount of money ($1). The final way to explain this equality requires that we rearrange the equation as follows: w MPL w r = ⇔ = MPL MPK MPK r

which says the cost of each input divided by its marginal product should be equal. This is basically describing the cost of obtaining one more unit of productivity using labor. Thus, when it costs the same to obtain one more unit of productivity using both labor and capital, your inputs are optimally allocated and you are minimizing cost. Note in this case the smaller the number the better, since it’s essentially the marginal cost of obtaining one more unit. _________________________________________________________________________________________________

Example: Cost – Optimization Condition

Suppose MPL = 10, MPK = 5, w = 20, and r = 5. Which input (labor or capital) should we use more of? Solution: We can look at this any of the three ways described above. Let’s look at the “bang per buck” method.

MPL 10 = = .5 w 20

MPK 5 = = 1 r 5

This says that for $1 spent on labor we can get 0.5 more units, but for $1 spent on capital we can get 1 more unit. Thus, we should be using more capital. _________________________________________________________________________________________________

208

So in order to minimize cost, two things must hold: 1. Optimization condition: MRTSij =

P Pi MPi MP j P or or i = j = Pj Pi Pj MPi MP j

2. Production constraint: q = f(x1,x2,…,xn) The optimization condition can be expressed in any of the three above ways, and it’s important that you be able to explain what they mean. It basically means your inputs are allocated efficiently. The production constraint just means that if you want to produce 10 units of output, you must have enough inputs to physically make 10 units. Note that the production constraint doesn’t take into account any sort of efficient allocation of inputs. Thus, solving 1 and 2 above together (simultaneously) will provide us with a cost function [C(q)], which is a function that tells us the minimum cost of producing q units. _________________________________________________________________________________________________

Example: Cost function

Suppose our production function is q = 4L0.5K0.5, the cost of labor is w=20 and the cost of capital is r=5. Find the cost function. Solution: We know both conditions have to hold (optimization, production). Let’s first look at the optimization. Using the form MRTSLK = (w/r), we see

MRTSLK=

MPL K = MPK L

which was found by dividing the two partial derivatives (marginal products). For a more detailed explanation of the algebra, refer back to the example on MRTS. The optimization condition tells us

MRTSLK =

w r

K 20 = = 4 L 5

⇔ K = 4L

which is telling us for each unit of labor we use, we should use 4 units of capital. Now that we have the optimization condition, we can substitute it into the production function to solve our system of equations. The production function was given to us in the problem. Thus,

q = 4L0.5K0.5

q = 4L0.5(4L)0.5

q = 8L

(substituting in our optimization condition)

209

q L* = and 8

q K = 4L so K* = 2

where L*, K* are the optimum amounts of labor and capital. This tells us how much capital and labor we need for q units of output in the most efficient way. To represent our cost function, we simply need to multiply how much each input cost by how many units of input we’re using. So C(q) = wL* + rK* ⎛q⎞ ⎛q⎞ C(q) = 20 ⎜ ⎟ + 5 ⎜ ⎟ = 5q ⎝8⎠ ⎝2⎠

and the minimum cost of producing q units is 5q. Note that if the question had asked what amount of labor would be needed to minimize the cost of producing 15 units, we could simply plug in 15 into our equation L = (q/8). _________________________________________________________________________________________________ Now let’s consider changing input prices. Looking at the optimization condition,

MPL MPK , = w r

suppose w (the cost of labor) increases. This must MPL mean that decreases. Thus, to maintain w equilibrium, we will use less labor and more capital. Graphically, we see in the top figure to the right as wage increases from w0 to w1, our iso‐cost line changes slope and no longer reaches our isoquant of q units. Notice that the y‐intercept for the original iso‐cost line doesn’t change, since the cost of capital (r) didn’t change. Thus, we have to increase the amount of both labor and capital in order to get the new (black) iso‐cost line C1 to reach our isoquant. It’s important to realize that C1 is greater than C0; this is because C0 could not reach the isoquant. Looking at the second figure to the right, we see that the amount of labor (L*) decreases and the amount of capital (K*) increases, as we would expect following an increase in the cost of labor.

c1 r c0 r q c0 w1

c0 w0

K*1 K*0 L*1

L*0

The fact that our allocation of labor and capital changed is not enough to conclude that C1>C0. This conclusion is drawn because once wage has increased, our original iso‐cost line (which, remember, shows combinations of labor and capital for the same cost C0) wasn’t able to reach the isoquant curve. This is why the new cost is higher than the old.

210

We’ve just seen how to use the optimization condition and production function to derive a cost function. Now when you see a general profit function π=p*q – C(q), you know what C(q) represents. The following is another way of arriving at the same cost function. It represents the method that the computer uses to solve the same system of equations that we did by algebra. It is a supplemental topic, not essential for the class, but occasionally has been asked as an extra credit question on a test.

The Calculus of Cost Minimization – the LaGrangian The way the computer looks at the problem is the following.

Minimize

wL + rK

subject to

q≤f(LK)

which just means the computer will use calculus to minimize the left function, while making sure there are enough inputs to achieve the desired level of output. The equation is L = wL + rK + λ[q‐f(L,K)] where λ is the Lagrange multiplier. Then, minimize by solving for L, K, and λ. Looking at the partials, we see

∂L = q− f (L,K ) = 0 ⇒ q = f (L,K ) ∂λ which is just our production function. The partials for the next two variables are

∂L ∂L = r − λMPK ⇒ r = λMPK = w − λMPL ⇒ w = λMPL and ∂K ∂L dividing these two equations we get

⎛ w ⎞ MPL w = λMPL ⇒⎜ ⎟= ⎝ r ⎠ MPK r = λMPK which is just our optimization condition. Thus we see the use of the Lagrange multiplier leads to the same set of equations that need to hold. Some students like to use this method to solve problems, but again, there will never be a problem that will explicitly require you to use this method.

Cost in the Short Run Up until this point, we’ve assumed that managers have complete control over how much capital or labor to buy, or that they are not limited in their decisions of inputs. In the shortrun, however, some of the inputs may be fixed. Thus, the STC(q) is just the short‐run (total) cost function, or the minimum cost for producing q units given a fixed amount of one (or more) input(s).

211

In the short‐run, there are fixed and sunk costs. It’s important to understand the difference. A cost is fixed if it doesn’t change as quantity changes. Think of it as the opposite of variable cost. A sunk cost is money that cannot be retrieved. If we take out a loan to build a factory and pay $10,000 a month in interest, that cost is fixed. If the factory cost $1,000,0000 to build, it’s not necessarily all sunk costs, because if we could sell the factory for $900,000 we could retrieve some of it back. In this example, the $100,000 of the factory that we can’t get back would be our sunk costs. Given the definition of short‐run, we can define some other common terms that will be helpful.

• SMC (shortrun marginal cost): the derivative of STC, or

dSTC dq

• SATC (shortrun average total cost): the average cost per unit, or

STC q

• SAVC (shortrun average variable cost): variable (total ‐ fixed) cost per unit, or STC − SFC q • SAFC (shortrun average fixed cost): average fixed cost per unit, or

SFC q

We’re now going to explain the graphs of all the curves. Let’s first look at SMC in the figure to the right. Why do we illustrate marginal costs as increasing? Diminishing returns to a fixed factor. $ SMC Remember that in the short‐run, there is some input that is fixed; for example, a plant. If you’re producing in a single plant, at some point if you try to cram more and more workers in that one plant, each worker’s q effectiveness will decrease. Thus, the decreasing marginal productivity with a fixed factor means increasing marginal costs. MC ATC = AFC + AVC The main assumption in the short‐ $ run is that there is a fixed factor, so AVC marginal cost is increasing. Average fixed cost decreases as quantity increases since our fixed costs are held constant in the short‐run. Average AFC variable cost, may fall at first if adding workers to a factory initially makes q everyone more specialized and productive. However, for the same reason that marginal cost increases ‐ diminishing returns to a fixed factor ‐ average variable costs will eventually rise. Average total cost, which is just the sum of the AFC and AVC curves, therefore must eventually rise as well.

212

One important thing to note about the above graph is that the marginal cost curve crosses the average variable cost curve where average variable cost is at its minimum. There is no economic reason for this; it’s just the way the math works out. The easy way to think of it is in terms of test grades. If your average is an 80 and you get a 90 on a test, it will pull up your average. Think of the marginal cost as your next test grade, and the average variable cost as your overall grade. The only time that the marginal cost curve would not change your average is if it were equal to your average (i.e. a test score of 80 wouldn’t change your class grade of 80). Thus, when they cross, average variable cost won’t go up or down, which is why it crosses it at the minimum point on the graph. This is the same reason why the MC curve crosses the ATC curve at its minimum. To clear up some terminology issues that may arise later, let’s define a tricky cost. Say your factory is up, but you aren’t producing any units. Any money you’re paying just for the factory is the fixed cost, since it doesn’t vary with output (as your current production level is 0). If you decide to produce even one unit, and have to incur another $10,000 in order to start your assembly line, clean your machines, etc. we refer to these start‐up costs as variable costs because they are avoidable. Even though these don’t change between producing 1 and 100 units, they are incurred between producing 0 and 1 unit, and thus are lumped into the variable cost category. Thus, the costs that are wrapped up into the AFC curve are those that are not avoidable; those that are avoidable are contained in the AVC curve. _________________________________________________________________________________________________ Example: Shortrun Cost Using the same production function, cost of labor, and cost of capital as in cost function example, if our amount of capital is fixed at K = 4, what is the new minimum cost function? Solution: From earlier, our production function was q=4 L K , r=5, and w=20. Since K is fixed, we can just plug it in to our production function.

q= 4 L 4 q=8 L

q2 = 64L L=

q2 64

Notice there is no need to efficiently allocate resources to both labor and capital (as in the previous example) since the amount of capital we have is fixed; in other words, we don’t have the freedom to parcel out our resources as economically as possible. Our short‐run total cost is STC = wL + rK

213

q2 + 5* 4 STC = 20 * 64 5 STC = q2 +20 16 If you remember from our first example (where capital wasn’t fixed) our cost function was 5q. Thus, the fixed factor has increased our minimum cost, as we would expect. The rest of our costs are as follows (remember in this case, capital is fixed and labor is variable): AFC = MC =

rK 20 = q q

⎞ 10 d⎛5 2 5 ⎜ q + 20⎟ = q = q ⎠ 16 8 dq ⎝16

5 2 q 5 wL 16 AVC= = = q 16 q q _________________________________________________________________________________________________

Longrun cost curves In the long‐run, there are no sunk costs. Thus, you have complete control over the amount of each input you choose to use. We assume that all costs are variable, since the variable costs by definition are those that can be avoided. In the long‐run, you can plan to either make or sell a factory, which is why there are no fixed (unavoidable) costs. The standard longrun average cost curve is shown in the figure to the right.

LRAC

IRS

CRS

DRS q

M.E.S.

The reason the LRAC curve is U‐shaped is the following. At low quantities, we can utilize economies of scale and our average cost falls. It’s at this point that adding more machinery or workers is beneficial. This is also called increasing returns to scale (IRS). Then there’s a period in the middle where adding another machine or worker costs about the same amount of money as it provides; thus, average cost doesn’t change a lot. This is called constant returns to scale (CRS). The final part of the cost curve increases, which suggests some decreasing efficiency as output increases. In our short‐run model, this is because we were restricted to a fixed factor (i.e. one plot of land, one factory). Why would average cost increase in the long‐run? We can build as many factories as we like, or hire as

214

much labor as we need; we aren’t restricted by any fixed factors. The replication argument suggests that as each of your production facilities reaches its constant returns to scale (which is also called the minimum efficient scale) that you should just reproduce that facility and thus avoid the increasing costs at higher outputs. There is a fallacy in this argument, however. Looking at an individual firm, we know that there are several workers at the bottom, some middle workers who manage the bottom workers, and then some sort of governance structure at the top that manages the long‐term direction of the firm. If the replication argument holds, in order to obtain higher outputs we simply open up new factories, each at our M.E.S. (minimum efficient scale) to avoid higher costs. The problem is that as the efficient process is being duplicated, there is more information for the managers at the top to process. Thus, as the firm grows in size managers become less efficient, and costs go up. This is the reason for increasing average costs at high quantities of output, or diminishing returns to scale (DRS). The long‐run marginal cost curve has the same property as the $ short‐run marginal cost line, in the sense that it crosses the average cost curve at its minimum. This is shown in the top figure to the right. If we add a demand curve to our LRAC curve, as in the bottom figure to the right, we can see that the different economies of scale that a firm may encounter as it increases $ production may not matter. Looking at demand curve d0 we see that the total industry demand will never be enough for our factories to encounter constant or even decreasing returns to scale. This situation lends itself to a monopoly, since it’s most efficient for one firm to produce the quantity demanded.

LRMC

LRAC

LRAC d0

d1 q

If instead demand is given by demand curve d1 there may be several firms producing at M.E.S without exceeding demand. In general, as demand for a product increases, the more firms may “fit” in the market, which makes intuitive sense.

215

Finally, let’s look at both short‐run and long‐run average cost curves on the same graph. Remember, the definition of LRAC is the minimum amount of producing q units; it isn’t subject to any sunk (or fixed) costs. At quantity q0 we get our minimum cost by looking at the LRAC curve. If we build a plant specifically designed to produce q0 units efficiently, SRAC and LRAC will be the same. However, if we $ LRAC SRAC use that plant to produce anything other than q0 units, SRAC>LRAC because in the short‐run we are limited to a fixed factor. This is true in general; short‐run average costs are higher than long‐run costs at any given quantity, unless your plant is producing the amount of output it was q q0 designed for.

Minimizing Costs with Multiple Plants Many firms have multiple plants, which are designed to use different production techniques. For a simple introduction, let’s assume we have two plants A and B. If all “fixed” costs are sunk, then the decision becomes how much to produce in plant A and how much to produce in plant B. Suppose that initially MCA<MCB. You should produce initially in plant A, since it’s cheaper to do so. However, we know that as we produce more in one plant, diminishing returns to a fixed factor will increase MCA. At some point, it will be as expensive to produce another unit in A as in B. At that point, it makes sense to use both plants. If ever the marginal cost is lower in one than in the other, cost could be reduced by reallocating output to produce more in the cheaper plant, which would increase its MC, and less in the other, which would decrease its MC. The only time output should not be reallocated is when the marginal costs are equal across plants. Thus, the optimum allocation of production across both plants occurs when MCA=MCB. This is illustrated in the figure to the right. we see at quantities up $ MCB MCA until q0 that MCA<MCB, so we just MCFIRM $* produce in A. Eventually MCA increases (because of diminishing returns) and we switch to using both plants. Remember, we want MCA=MCB, so for a given level of marginal cost ($*), we see how much we can produce in plant A (qA*), how q0 qB* qA* q* q much we can produce in plant B (qB*) and add together the horizontal distances to obtain how much our firm can produce at a given marginal

216

cost (q*). Since this is the minimum cost of producing q units for our entire firm (using both plants), this becomes our firm’s MC curve. In summary, if fixed costs are sunk, produce where MCA=MCB. Suppose “start‐up” costs are not yet sunk. In other words, we can avoid certain costs if we don’t produce any units in a certain plant, but, as soon as we produce any output at all, the cost becomes fixed. For example, it might cost $10,000 to start up a production line. Once the line is ready to go into production, that $10,000 is sunk and cannot be recouped. Thus, for very low quantities, we would want to use only the plant with the lowest start up costs. Suppose plant B has the lowest start up costs, even though it has a higher marginal cost than plant A. Then we would use only plant B for a small enough output level to save on start up costs. For very high quantities, the start‐up costs become less important, and you would want to minimize the variable costs. This becomes the same situation as earlier, and you would want to use both plants and produce where MCA=MCB. At some intermediate quantity, it may make sense to use just the plant with the lowest MC, but this is not certain – it depends on the particulars of the situation. To conclude, note that there are three possibilities for producing q units: produce them all in plant A, produce them all in plant B, or (efficiently) split up production between plants A and B. The previous paragraph had some rules of thumb for choosing which plants to use at certain quantities to obtain minimum cost, but these can always be verified simply by testing all three methods. _________________________________________________________________________________________________ Example: Multiple Plants Suppose there are two plants with cost functions C(q1) = 20 + .25q12 and C(q2) = 10 + q22 and the fixed costs in plant 1 are sunk. What is the minimum cost of producing 20 units? Solution: A production level of 20 units is neither especially high nor low, so let’s test all three methods and see which one is cheapest. Note the fixed costs in each plant are the part of the cost function that does not depend on q (20 in plant 1 and 10 in plant 2). First let’s see how much it costs to produce 20 units in only plant 1: C(q1) = 20 + .25(20)2 = 20 + .25(400) = 120 Now, let’s look at producing all 20 units in only plant 2. Notice that the problem tells us that the $20 of fixed costs in plant 1 are sunk, so even though we’re not producing any units there, we have to take them into account when looking at using only plant 2: C(q2) = 10 + (20)2 + 20 = 30 + 400 = 430 Finally, let’s look at the cost of producing 20 units using both plants. When using both plants we want to set MC1=MC2.

217

MC1=

d [C(q1 )] = .5q1 dq1

and MC2=

d [C(q2 )] = 2q2 dq2

Setting these two equal we get .5q1=2q2 q1=4q2 which says that for every unit we produce in plant 2 we should produce 4 in plant 1. Since we have only one equation and two variables, we can’t solve it. However, we know that the total amount of units that we are going to produce, q, must be the amount we produce in plant 1 plus the amount we produce in plant 2: q = q1 + q2 and substituting in our equation from setting marginal costs equal: q = (4q2) + q2 q = 5q2 q2 = (q/5) which means (1/5) of total quantity will be in plant 2. Looking again at the total quantity equation we see q = q1 + q2 q = q1 + (q/5) q1 = (4q/5) so the other (4/5) of total quantity is produced in plant 1. To find the total cost of production using both plants, we just add the cost of producing q1 units in plant 1 to producing q2 units in plant 2: C(q) = 20 + .25(q1)2 + 10 + (q2)2 C(q) = 30 + .25(.8q)2 + (.2q)2 C(q) = 30 + .2q2 Since q = 20 units, we can see how much this would cost: C(20) = 30 + .2(202) = 110 Thus, to minimize cost, we should use both plants, producing 16 in plant 1 and 4 in plant 2 (found by using the above equations for q1, q2). ________________________________________________________________________________________________

218

Chapter 11 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Economic Cost The cost of the decision made plus the opportunity cost of choosing the next best alternative. This includes both quantifiable and qualitative costs. Opportunity Cost The value of an alternative given up when making a decision. Accounting Cost Strictly the monetary cost of a decision. This does not include opportunity costs. Marginal Product The additional output achieved when adding another unit of input. Mathematically, it is the partial derivative of the entire production function with respect to the input in question. Capital (K) Factors of production used to make goods and services. Perfect Substitutes Goods that share similarities and thus can be replaced or substituted by one another. In input substitution, these are inputs that when substituted do not have any effect on output. In this case, a manager would just buy whichever input is the cheapest. Perfect Complements Goods that are purchased with one another. In input substitution, these inputs have fixed proportions. In this case, a manager has no discretion over the choice of inputs. Imperfect Substitutes Goods that fall somewhere between perfect substitutes and perfect complements. Decision‐making is important with these goods because a manager must choose an input combination that is the most efficient in order to determine the optimum allocation of inputs. Marginal Rate of Technical Substitution (MRTS) A tool for measuring substitutability. It is the rate at which one input can be substituted for the other while producing the same level of output. Isoquant A graphical representation of the substitutability of inputs. It shows different combinations of inputs that produce a single amount of output. Isocost A line that shows different combinations of inputs that cost the same amount. LaGrangian A way to set up a constrained optimization problem for finding a minimum or maximum function subject to constraints. This can be used to minimize cost with respect to a production constraint or maximize utility with respect to a budget constraint. Sunk Cost Money spent in the past that can never be retrieved, regardless of the any decision made.

219

Short Run Time period in which there is a fixed factor of production (e.g. a production plant) and fixed costs cannot be avoided if the manager decides to shut down the firm. Long Run Time period in which there are no fixed factors of production. All fixed costs can be avoided and there are no sunk costs. A manager in a long run situation would have complete control over the amount of each input used. StartUp Cost Cost incurred only when a manager chooses to start production in a firm. Since this type of cost can be avoided, it is referred to as a variable cost. Long Run Average Cost A curve made up of the minimum points of all of the short run average cost curves. Economies of Scale Benefits that a firm experiences from expansion up until a certain point in the long run. Increasing Returns to Scale (IRS) In the long run, the period at low quantities when a manager can continue to produce more by purchasing fewer inputs, thus lower points on the average cost curve. Constant Returns to Scale (CRS) In the long run, the period at a medium level of output when producing more is going to cost the same. Decreasing Returns to Scale (DRS) In the long run, the period at high levels of output when producing more means buying even more inputs than usual, thus higher points on the average cost curve. Minimum Efficient Scale (MES) The point at which long run average costs are minimized. When producing high levels of output, a manager will want to open many plants operating at MES instead of producing the output in one plant. Replication Argument Argument that suggests that as each production facility reaches its constant returns to scale or MES, the facility should be reproduced, thus avoiding the increasing costs at higher outputs. A fallacy to this argument is the high level of information to be processed by top‐level management, leading to inefficiency and higher costs (diminishing returns to scale).

220

Part 4 Game Theory ‐ Modeling Strategic Interaction

221

Chapter 12 One Shot Games with Discrete Choices In markets where firms are neither pure monopolists nor perfect competitors (price takers), one firm’s actions have significant impacts on the others. Thus, each firm takes account of this interdependence when making decisions. Game theory is a tool that allows us to analyze how decisions are made in environments involving such strategic interdependence. When we analyze situations (games) in which strategic decisions are weighed against one another, we are basically searching for “reasonable” solutions to the game – that is, predictions about the way “rational” players would play. A simultaneous game is one in which both players move at the same time. Think of it as a football game where, during each play, both players have to make their best guess about what they think the other will do. A sequential game is where the first player moves, and then the second player moves. Think of it as a game of chess; the second player that moves already knows what decision the first player has made, and factors that into what they plan on doing. Finally, a oneshot game is one that is played just once, while a repeated game is one that is played many times. The basic building blocks for describing a game are the following: 1. List of players 2. List of decision nodes (points in a game where a decision is to be made) and choices available at each node 3. Payoffs (what they are playing for) for every possible outcome. In order to analyze a game, we take the basic description of the game and use it to identify all possible strategies for each player, that is, all the ways it is possible for them to play the game, conditional on all the different situations in which they might possibly find themselves while playing the game. The number of potential ways even a simple game of tic‐tac‐toe can be played is huge. (The first player has 9!, or 362,880, potential strategies, to be exact.) So, strategies get complicated even for simple games. Fortunately, the games we will look at are very simple, focusing on only the one or two most important strategic aspects of a situation.

One Shot Simultaneous Move Games We will start with the simplest case, a one‐shot, simultaneous move game. Remember, a one‐shot game means after the game the players aren’t concerned about any ramifications their decisions may have. Also, a simultaneous game is one in which all players must act without knowledge of their competitors’ decisions.

222

The Prisoner’s Dilemma Imagine Luke and Sam are criminals who have been convicted of a certain crime. The prosecution says that there is enough evidence to convict both of them for some minor crime, and they will assuredly go to jail for 6 months. They then take each criminal into a separate room and tell them if they will testify against their partner, they will drop the 6‐month charge, and their partner will get 24 months for being convicted of the larger crime. Each criminal has no idea what his partner chose, and has to make the decision without cooperation. To look at this game, we will set up a matrix describing all possible outcomes (which is called the normal form):

LUKE

SAM Testify

Not

Testify

L: 24

L: 0

S: 24

S: 30

L: 30

L: 6

S: 0

S: 6

Not

To clarify the payoffs:

• TOPLEFT: Luke testifies, and Sam testifies. Since the deal was to get the minor charge dropped if you testify, both players are guilty of only the major charge (24 mos.). • TOP‐RIGHT: Luke testifies, Sam doesn’t. Therefore Luke gets the minor charge dropped, and Sam is guilty of both the minor (6) and major (24) charges. • BOTTOM‐LEFT: Luke doesn’t testify, Sam does. Sam gets the minor charge dropped, and Luke is guilty of both (30). • BOTTOM‐RIGHT: Neither testifies. Thus, they are both guilty of the minor charge (6). Now that we have the payoffs, we can look at this game from each player’s perspective, and look at their best responses given their isolated knowledge. First, let’s define some special types of strategies:

• Strongly Dominant: A strategy that always has a higher payoff than your other strategies, regardless of your opponent’s choice. • Weakly Dominant: A strategy that always has at least as high a payoff as your other strategies, regardless of your opponent’s choice. These will make more sense when we finish our example of the Prisoner’s Dilemma.

223

Luke

Sam Testify

Testify L: 24* S: 24 Not L: 30 S: 0

Not

L: 0* S: 30 L: 6 S: 6

Luke

Sam Testify

Testify L: 24 S: 24* Not L: 30 S: 0*

Not L: 0 S: 30 L: 6 S: 6

The left picture represents Luke’s strategy. There are two different scenarios he is faced with – Sam testifying, or Sam not testifying. Looking at COLUMN 1, where Sam testifies, Luke can either testify with a payoff of 24, or not testify with a payoff of 30. His best option is to testify, getting only 24 months of prison instead of 30. Looking at COLUMN 2, where Sam does not testify, Luke can either testify with a payoff of 0, or not testify with a payoff of 6. Luke will testify, since he will get 0 months of prison instead of 6. Since Luke will testify regardless of whether Sam testifies or not, it is his dominant strategy. Since his payoff using his dominant strategy is always higher than his other strategy, it is a strongly dominant strategy. Sam follows the same logic, first looking at ROW 1 where Luke testifies, and he is better off testifying, and then looking at ROW 2 where Luke does not testify, and he is still better off testifying. Sam also has a strongly dominant strategy, since he will always testify, and testifying will always yield a higher payoff than not testifying. To look at the solution for this game, look at both players’ strategies:

SAM

LUKE

Testify

Not

Testify

L: 24* S: 24* L: 30 S: 0*

L: 0* S: 30 L: 6 S: 6

Not

We can see that since both players will choose to testify, the outcome of this game will be the top‐left box, or both players getting 24 months of prison. Notice that if the players were to cooperate, they would not tell and only get 6 months in prison; each player acting in their own self‐interest causes them to get the inferior outcome of 24 months in prison. This game is supposed to serve as a metaphor for two firms in competition with each other. If the two players were Coke and Pepsi, perhaps they could cooperate to get higher profits; but each firm acting individually leads to lower profits for both.

224

Entry Game This is a game where there is an incumbent firm, and a firm that is considering entry into the market. The incumbent has to decide whether or not to expand their business to meet demand. The entrant has to decide whether or not to enter the market. The payoffs in this game are profits in dollars.

Out

Expand

I: 20 E: ‐20 I: 30 E: 20

I: 50 E: 0 I: 80 E: 0

Incumbent

Entrant

Not

To provide a story to go with these payoffs, assume that it’s inefficient for the incumbent to expand; they’d rather stay status quo than trying to satisfy the entire market. If they don’t expand, and the entrant comes in, they will lose a lot of business. Finally, if the entrant comes in and the incumbent expands, there’s an excess of capacity, but the incumbent will still make some profit due to brand recognition. Now let’s look at the best responses:

Entrant

Incumbent

Out

Expand

I: 20 E: ‐20 I: 30 E: 20

I: 50 E: 0 I: 80 E: 0

Not

The incumbent has a strongly dominant strategy, since not expanding will always provide a higher payoff. The entrant does not have a dominant strategy, since they want to enter only if the incumbent is not expanding. The solution to this game is not as clear, since both players don’t have dominant strategies. However, there is still a “reasonable” solution. From the incumbent’s perspective, they’d rather not expand regardless of the entrant’s strategy. So when the entrant is looking at this game, it’s reasonable to assume that the incumbent will never expand. Given that piece of information, the rational decision for the entrant to make is to enter the market. This process is called iterated elimination of dominated strategies, and is not as definitive a solution as if both firms had strongly dominant strategies. Once the “dominated” strategy of expanding is eliminated from consideration, the entrant has a clear choice – to enter. The incumbent would prefer the entrant not to enter to achieve a profit of 80, but this

225

would require the entrant to believe the incumbent was expanding. This is possible in a sequential game, however, and will be covered after some more discussion on simultaneous games. Battle of the Sexes Suppose a couple is going out after work to either a basketball game or a show. They plan to talk by cell phone after work to decide whether to meet at the game or the show. When they get off work, cell service is out, and they can’t talk. Each one has to decide where to go without communicating. The normal form of the game is shown in the table below.

Show

Game

Show

B: 3 G: 2 B: 0 G: 0

B: 1 G: 1 B: 2 G: 3

Boy

Girl

Game

To explain the payoffs:

Assume the boy’s favorite activity is the show, and the girl’s favorite activity is the game. They would both rather be together at their least favorite activity, then apart at their favorite activity. So, when they are together at the show (top‐left) the boy gets one more utility than the girl, since he likes the show. When they are together at the game, the girl gets one more utility then the boy, since she likes the game. When they are apart but at the activities they respectively like, they each get one utility. When they are apart but at the activities they don’t like, they both get 0. Let’s analyze this game.

Show

Game

Show

B: 3* G: 2* B: 0 G: 0

B: 1 G: 1 B: 2* G: 3*

Boy

Girl

Game

We see neither player has a dominant strategy. In some cases they would each like to go to the show, but in others they’d like to go to the game. Since the idea of dominance doesn’t get us anywhere in this game, it’s time to introduce some new terminology to cope with games such as the one above.

226

• Maximin or Secure or Safe: This is essentially a strategy that has the best downside or avoids the worst outcome. Looking at this game again, let’s see what both players’ secure strategies are.

Show

Game

Show

B: 3* G: 2* B: 0 G: 0

B: 1 G: 1 B: 2* G: 3*

Boy

Girl

Game

Looking at the game from the boy’s perspective, we see that the worst thing that could happen to him if he chooses the show is he gets a payoff of 1, while the worst thing that happens if he chooses the game is he gets a payoff of 0. Thus, the boy’s secure strategy is to go to the show, to minimize his potential losses. Similarly, the girl’s worst payoff if she goes to the show is 0, while her worst payoff if she goes to the game is 1, so her secure strategy is to go to the game. Thus, if both players play their secure strategies, the girl will go to the game and the boy will go to the show. There is a problem, however, with the secure strategy in this game. We are assuming all players are rational, and want to maximize their expected utility. If the boy believes that the girl will play her safe strategy, he will anticipate her going to the game. If he thought this to be the case, he would rather go to the game because his utility would be one point higher. The girl could obviously go through the same thought process, which would lead them to the worst possible outcome of the boy going to the game and the girl going to the show. The point is that playing a secure strategy doesn’t always make sense. This leads to an idea that is at the heart of any “reasonable” solution to a game.

• Nash Equilibrium: A strategy pair where both players are simultaneously correctly anticipating the other’s play and choosing their best response. John Nash proved that any game for which the basic building blocks can be written down as described above has at least one Nash equilibrium. Any “reasonable” solution to a game must be a Nash equilibrium. Otherwise, either a player is not correctly anticipating their opponent’s play, or they would prefer to choose a different strategy but are not doing so. However, not every Nash equilibrium need be a “reasonable” solution, as we will see. At this time let’s also introduce another term. A reaction function is simply a function that puts together all of a player’s best responses. Let’s look at a reaction function in the context of the above battle of the sexes game. From the boy’s perspective, if the girl goes to the game, he will too, and if the girl goes to the show,

227

he will too. This is also the case for the girl. The Nash equilibria are where the reaction functions intersect. The graph below illustrates this.

Show

Game

Show

B: 3* G: 2* B: 0 G: 0

B: 1 G: 1 B: 2* G: 3*

Boy

Girl

Game

Nash Equilibria

From before, we had shown that neither player had a dominant strategy, as their best responses were dependent on the other player’s strategy. Their best responses are in bold in the above table. Since the Nash equilibrium is where their best responses intersect, there are two Nash equilibria in this game. This suggests that there’s no real solution that game theory can come up with for this game, even though there are two outcomes where each player is playing their best response to the other. We can imagine that all of the influences dictating the choice of the boy and girl in the above game are not included in the payoffs. For example, the boy may want simply to act chivalrous and defer to the girl’s desires, regardless of his own individual payoff. In this way, social norms may create focal points, which are obvious solutions to those who know the richer context in which a game takes place. The following is an example of a focal point. Imagine the game the girl wanted to see was the national championship – the Nash equilibria would be the same, but most likely the boy would know how much the girl wanted to see the game, and that would be the outcome. Really, of course, if such other factors affect the payoffs, we should include them in the game, and then the set of Nash equilibria might look different. But, it may make sense to model a game based on some notion of “typical” payoffs and then wonder outside the formal context of the model what sorts of conditions might favor one choice over another in any particular situation. Referring back to the prisoner’s dilemma and the entry game, there was only one cell where both players’ best responses were chosen, and since those cells were also intersections of their reaction functions, they were the Nash equilibria.

Testify

Not

Out

Testify Not

L: 0* S: 30 L: 6 S: 6

Expand

L: 24* S: 24* L: 30 S: 0*

I: 20 E: ‐20 I: 30 E: 20

I: 50 E: 0 I: 80 E: 0

Entrant Game

Prisoner’s Dilemma

Not N.E.

228

Monitoring Game Suppose Jen is an employee, and Eric is her manager. Jen can choose to work hard on any given day, or shirk (slack off). Eric can choose to check up on her, or not to.

Jen

Eric

Hard

Shirk

Check

E: 30 J: 20 E: 40 J: 20

E: 30 J: 0 E: 10 J: 30

Not

To tell a story about the payoffs, imagine the profits of the business are just based on Jen’s work. If she works hard, the profits the business earns are going to be the same. If Eric takes time out of his day to check up on Jen, it is going to cost him something (in the form of opportunity costs). So, if he doesn’t check up on Jen and she works hard, his payoff is 40, but if he checks, he loses 10 units in the form of time spent checking up, and his payoff is 30. Thus, he would rather her work hard without requiring supervision. If she does shirk and he checks, suppose he can fix the problem, so the business still makes the normal profits, but Jen receives discipline and gets 0 as payoff. Jen gets paid 20 (unless she is caught shirking) but receives an extra 10 units from shirking, as long as she isn’t caught. Now let’s mark the best responses.

Eric

Check Not

Jen Hard Shirk E: 30 J: 20* E: 40* J: 20

E: 30* J: 0 E: 10 J: 30*

Eric will check if he thinks Jen will shirk, but he won’t if he thinks she’ll work. Jen will work hard if she thinks Eric will check, but she’ll shirk if she thinks he won’t check. Thus, it seems as if this game has no Nash equilibrium, since each player’s best responses never intercept. However, it’s because there are different types of Nash equilibriums. They are the following:

• Pure strategy N.E.: Both players make definitive choices. • Mixed strategy N.E.: Each player chooses some probability associated with each response. The choices aren’t definite; they are probabilistic. Since there is no pure strategy N.E. for the game, there has to be some probability associated with all of the responses. From Jen’s perspective, if she works

229

hard she will get 20 if Eric checks, and 20 if Eric does not check. An equation that represents this, her expected profit from working hard, is 20fc + 20(1‐fc) where fc is the probability that Eric checks (and 1‐fc is the probability that Eric doesn’t check). Therefore, her expected profit from shirking is 0fc + 30(1‐fc) and based on Jen’s estimation of Eric’s probability of checking (fc) she can choose which response (working hard or shirking) results in a higher expected profit. If the first equation is higher than the second equation, she will work hard; if it’s less, she will shirk. Now it’s important to understand that if the probability of checking is very low, Jen’s expected profit from shirking will be higher, and Jen will always shirk. If the probability is really high that Eric will check, Jen will always work hard. The only time that Jen will randomize her response is when the two expected profits are equal. She will do this in order to “fool” Eric, because if she were not to randomize her response, Eric would know how to react every time. In order to find out when the expected profits are equal, we just set the two expressions equal: 20fc + 20(1‐fc) = 0fc + 30(1‐fc) 20fc + 20 – 20fc = 0 + 30 – 30fc 20 = 30 – 30fc 30fc = 10 fc = 1/3 Therefore, the probability that Eric checks up on Jen is 33%. If it were lower than 33%, Jen would always shirk. If it were higher, Jen would always work hard. A probability of 33% keeps Jen randomizing her response. To find Jen’s probability of working hard, we look at Eric’s payoffs. His payoffs for checking are 30fh + 30(1‐fh) = 40fh+10(1‐fh) 30 = 40fh + 10 – 10fh 30fh = 20 fh = 2/3 and thus the probability that Jen will work hard is 67%. The interpretation of these two probabilities is the following: If you are in a situation where Jen thinks there’s a 33% chance that Eric will check on her, she’s happy to work hard 67% of the time; and if you are in a situation where Eric thinks Jen will work hard 67% of the time, he is happy checking up on her 33% of the time. The only time you get into a situation where both players are happy with their strategies and are guessing right about the

230

other player’s strategies, is if they are randomizing their responses with these probabilities. The reason randomizing is important in the context of this game is because players can exploit predictability. Here, if Jen knew Eric was just too lazy to check, she could exploit that and shirk. That’s why in equilibrium both players are randomizing to maximize their expected payoffs. Note that being unpredictable (randomizing your response) doesn’t necessarily mean playing the game with a 50% chance either way. We just solved the game above and found out that most of the time (67%) Jen will work hard and seldom (33%) will Eric check up on her. This is because the payoffs of the different outcomes influence how tempting it is to either shirk or check up. Imagine, for example, Jen’s payoff for shirking and not being caught increases; her probability of working hard would drop to account for the new payoff structure, and Eric’s probability of checking would increase as a result.

One Shot Sequential Move Games A sequential game has a first mover and a second mover. The important thing here is that since the first mover knows the second mover will act based on the first response, the first mover has control over how the game will be played. Let’s look at the entry game again, but in sequential form.

Entrant

Incumbent

Out

Expand

I: 20 E: ‐20 I: 30* E: 20*

I: 50 E: 0* I: 80* E: 0

Not

Earlier, we figured out the Nash equilibrium to be the incumbent not expanding and the entrant entering; this is because the incumbent has a dominant strategy to not expand, and the entrant relying on this knows his best option is to enter. What if, however, there was some way that the incumbent could promise the entrant that he would expand? The incumbent would prefer this, because he knows that it would keep the entrant out, and he could get a payoff of 50 (top‐right cell). The problem is that expanding is not a credible threat; this is because this is a one‐ shot simultaneous move game, and like we’ve already said, the incumbent has a dominant strategy of not expanding. Thus, the entrant will never believe a threat of expansion. This is where the possibility of a sequential game comes in. If the incumbent is the first mover, he has the advantage of being in a position to convince the entrant that he is committed to expanding. Note that this does not mean the incumbent

231

actually has to expand during his move; he just has to credibly and irrevocably commit to expanding. To model the sequential game where the incumbent moves first, we need to look at the entrant’s strategies. These become more complex, since they are conditional on the decision made by the incumbent.

Incumbent’s 1st Move Expand Not

Entrant’s Strategies

In In Out Out

In Out In Out

Each row represents a different set of strategies based on what the incumbent does in the first move. Think of each row as a set of instructions about what to do in each situation. Let’s represent all four of these strategies in a normal form table, as we have been in the previous games.

Entrant

Incumbent

In, In

Expand I: 20 E: ‐20 Not I: 30 E: 20

In, Out Out, In Out, Out I: 20 E: ‐20 I: 80 E: 0

I: 50 E: 0 I: 30 E: 20

I: 50 E: 0 I: 80 E: 0

For the column headings (the entrant’s strategies), the first word represents the response given the incumbent expands, and the second word (after the comma) represents the response given the incumbent does not expand. These payoffs come from looking at the initial table of the game. The far left column is the same as the original left column, and the far right is the same as the original right column. The difference is the middle two columns, which represent the entrant’s ability to choose what to do based on the incumbent’s first move. Let’s look at the best responses:

232

Entrant

Incumbent

In, In

Expand I: 20 E: ‐20 Not I: 30* E: 20*

In, Out Out, In Out, Out I: 20 E: ‐20 I: 80* E: 0

I: 50* E: 0* I: 30 E: 20*

I: 50 E: 0* I: 80* E: 0

We see the players’ best responses intersect twice, and thus there are two Nash equilibria. However, one of them doesn’t make much sense. The first column (In, In) means the entrant will enter no matter what. However, since this is a sequential one‐shot game, it doesn’t make sense for the entrant to play this strategy, since if the incumbent expanded, they would lose 20. Looking at the other strategy that contains the Nash equilibrium (Out, In) we see that the entrant could either get 0 or 20, which is always at least as good as the strategy (In, In). Therefore, the strategy (Out, In) weakly dominates the strategy (In, In), and we can conclude that the strategy (In, In) will likely not be played. Until now we’ve been looking at games in the normal form, such as above. Since we’re dealing with sequential games, we can also write them out as a game tree, or in extensive form. Expand

Entrant

Incumbent Entrant

Don’t

I: 20 | E: ‐20

Out

I: 50 | E: 0

I: 30 | E: 20

Out

I: 80 | E: 0

The way we find out the solution to this game is through backwards induction. Looking at the top two responses, which is the scenario where the incumbent expands, the entrant has a choice between a payoff of ‐20 and 0. Thus, we can cross off ‐20 (the entrant entering). Expand

Entrant

Incumbent Entrant

Don’t

233

I: 20 | E: ‐20

Out

I: 50 | E: 0

I: 30 | E: 20

Out

I: 80 | E: 0

Now looking at the bottom two responses, which is the scenario where the incumbent does not expand, the entrant has two possible payoffs, 20 and 0. He chooses 20, so we can eliminate the last strategy. This shows the entrant’s best responses to the incumbent’s decision. Remember when we first introduced sequential games we said it’s the first mover that has control over the final outcome of the game. We can imagine that the incumbent went through the same exercise that we just did, and knows how the entrant will respond to each decision. Thus, the incumbent essentially has a choice over 50 and 30. He chooses 50, so he will expand, knowing that the entrant will stay out, giving us the solution to the game (circled in green). This solution is called the subgame perfect Nash equilibrium (or SPNE); it is also referred to as the rollback equilibrium. The main assumption with SPNE is that every player acts rationally at every fork in the tree, whether or not they get to that particular fork. This is why we were able to eliminate two of the entrant’s four strategies, which led us to eliminating one of the incumbent’s two strategies. Think back to the table that gave us two Nash equilibria. We now see the first one (I:30 E: 20) didn’t make sense, because it involved the entrant acting irrationally at a node of the game tree – he would have had to enter into the market even if the incumbent expanded, giving him a payoff of ‐20 instead of 0. This is why we call it the SPNE; it is perfect in the sense that every player is acting rationally, and that there are no noncredible threats. This is an important concept when using the game tree.

More Than Two Strategies and More than Two Players Imagine player A and B are competing with each other, and each strategy represents how hard they compete. Player A has strategies top, middle and bottom. Player B has strategies left, middle, and right. Below is the game illustrated, showing each players’ best responses. B

Left

Middle

Right

Top

A: 0 B: 10 A: 10 B: 20* A: 30* B: 10

A: 10 B: 0 A: 20* B: 20* A: 10 B: 20*

A: 30* B: 20* A: 0 B: 10 A: 10 B: 0

Middle Bottom

N.E.

Looking at player A: If B chooses left, A chooses bottom. If B chooses middle, A chooses middle. If B chooses right, A chooses top. Looking at player B: If A chooses top, B chooses right. If A chooses middle, B chooses left or middle. If A chooses bottom, B chooses middle.

234

We see there are two Nash equilibria. We would need more information to solve this game. Perhaps one of the two solutions is a focal point under some circumstances. Perhaps A is a market leader, and everyone expects them to play top. Perhaps, since player B does not care, they both expect A to play top because A does care. The main point here is that there is nothing special about games with two strategies. The same solution concepts apply when there are many strategies; they are just more complicated in the application. In particular, iterated dominance becomes more complex. To apply that technique, we would first identify any strategies that are always worse (strongly dominated) than some combination of a player’s other strategies, OR always worse or at least no better (weakly dominant) compared to some combination of the players strategies. These dominated strategies are then eliminated for both players, and, the new, reduced game is examined. Once again, dominated strategies are eliminated. That is why this technique is called iterated dominance. This continues until none of the strategies remaining are dominated. If only one strategy is left for each player, the game has a solution by iterated dominance. Now, suppose there were three players in the above game, instead of just two. We could imagine that player A still chooses a “row,” player B chooses a “column,” and, that there are three different game tables, like the one above, not just one, and the third player, player C, gets to choose the “table.” The idea would remain the same, although it would be more cumbersome to put into practice.

235

Chapter 12 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Strategic Interdependence A theory that, in a market with a few players, each firm sets prices and quantities based on the other firms’ behavior. Each firm has some degree of influence over supply and demand, thus all firms are dependent on each other to make the best managerial decisions. Game Theory A study of behavior in strategic situations. It allows economists to analyze how competing firms will make decisions based on all available information. It is a search for the most reasonable way a player would act when strategic decisions are being weighed against one another. Simultaneous Game A game in which both players move at the same time. Sequential Game A game in which there is a first and a second mover and the second mover will know how the first mover behaved. OneShot Game A game that is not repeated. Repeated Game A game that is played many times. Prisoner’s Dilemma A one shot, simultaneous move game in which both players would benefit more by cooperating but whose Nash Equilibrium choice offers a smaller benefit because players are looking to minimize their losses. Dominant Strategy Strategy that is played regardless of what the player’s opponent plays. Strongly Dominant Strategy A player’s dominant strategy that always has a higher payoff than his other strategies. Weakly Dominant Strategy A player’s dominant strategy that always has at least as high a payoff as his other strategies. Iterated Elimination of Dominated Strategies A player’s strategy that involves eliminating strategies that are dominated, taking the remaining ones as a new game, and repeating until a solution is found. MaxiMin Strategy Also called a secure or safe strategy; it involves the players choosing strategies with the least amount of risk. The players want to minimize their potential losses. Nash Equilibrium A solution to a game in which players know each other’s equilibrium solutions. It assumes that if there is a best or most reasonable solution to a game, both players must be choosing their best response to the other player. Every game that can be written in a normal form has a Nash equilibrium. It occurs where the player’s reaction functions cross. Reaction Function A function that puts together all of a player’s best responses to his opponent’s decision.

236

Focal Point When a player’s focus is on other factors outside of the model that will ultimately impact the outcome. Pure Strategy Nash Equilibrium Solution in which both players make definitive choices. Mixed Strategy Nash Equilibrium Solution in which each player chooses some probability associated with each response. Choices aren’t definite; they are probabilistic. NonCredible Threat An empty threat made by a rational player. It is not in the player’s best interest to carry out this threat; opponents will know this and therefore never believe the threat. Extensive Form A game tree used to represent sequential games. Backwards induction is used to find the solution. Backwards Induction Strategy of finding a solution to a sequential move game in which a solution for the second mover is found, then the solution for the first mover is found based on the second mover solution. SubGame Perfect Nash Equilibrium (SPNE) Also called rollback equilibrium, it is a solution to a sequential game in which players act rationally at every fork in the tree, whether or not they get to that particular fork.

237

Chapter 13 One Shot Games with Continuous Strategies When a firm makes a choice about the price to charge or the quantity to produce, it faces a continuum of choices, not just a small discrete set. How do the above arguments extend to such cases? The basic ideas are the same. Each player makes its best guess about what other players will do, and chooses its best response. In a Nash equilibrium, all players choose best responses to the strategies of all other players. We begin by first considering a simultaneous move game and then sequential moves in an extended example. After that, we briefly consider a general version of a game with continuous strategies to show that the techniques and intuition applied in the example apply to any game.

An Extended Example – Advertising Simultaneously In chapters 12 and 13, we will take up the choices of output and price in situations of strategic product market competition. To avoid monotony we use a different example to illustrate the ideas here – the decision of how much to advertise when each firm’s advertising has significant effects on the sales of the other firms. Kevin’s Big Store and Morgan’s Monster Market have to decide how much to advertise. Assume product prices are predetermined and each customer will generate $10 in profit for whichever firm they purchase from, before advertising expenses are subtracted. Mainly, we make that assumption to simplify things. If you want a more concrete setting, suppose the competing firms are retailers who are bound by contracts with manufacturers to charge the manufactures suggested retail price (MSRP). Think of advertising as producing a series of messages from the firms to potential customers. Let aK and aM represent advertising expenditures by Kevin and Morgan, respectively. For our example, assume the quantity sold by Kevin is

qK = 10 + 0.5aK − 0.25aM − 0.005 ( aK + α aM ) 2

(13.1)

and the quantity sold by Morgan is

qM = 10 + 0.5aM − 0.25aK − 0.005 ( aM + α aK ) . 2

(13.2)

What sort of situation might demand functions like these represent? First, if neither advertises, each has sales of 10. However, if one firm does advertise, they initially gain sales at a rate of 0.5 per unit of advertising, but they reduce the other firm’s sales by 0.25 units. Thus, advertising both draws in new customers and takes some away from the other firm. Diminishing returns are reflected in the quadratic term – they can’t bring in new customers indefinitely or take customers away from the other firm indefinitely. As Kevin advertises more, holding Morgan’s advertising constant, the marginal return to his advertising falls. However, the quadratic term in

238

Kevin’s demand depends on both Kevin’s advertising and Morgan’s. That means changes in Morgan’s advertising affect the rate at which the marginal returns to Kevin’s advertising fall. Perhaps an increase in Morgan’s advertising causes the marginal productivity of Kevin’s advertising to fall more slowly. That might occur if the customers Morgan gains by advertising, including new ones brought into the market and customers lured away from Kevin, are more easily lured away from her by Kevin’s advertising, and vice versa. Of course, it is also possible that an increase in Morgan’s advertising might cause the marginal returns to Kevin’s advertising to diminish faster. For example, Morgan’s advertising may create brand loyalty and reduce Kevin’s ability to lure customers away with his own advertising. As we will later see, the effect of one player’s actions on the marginal productivity of the other player’s actions is CRUCIAL in determining the character of strategic interactions. In this example, whether Morgan’s advertising increases or decreases the productivity of Kevin’s advertising depends on the parameter α. If α<0, an increase in Morgan’s advertising means the marginal returns to Kevin’s advertising diminish more slowly – so an increase in Morgan’s advertising increases the marginal productivity of Kevin’s advertising. If α>0, an increase in Morgan’s advertising means the marginal returns to Kevin’s advertising diminish more rapidly – so an increase in Morgan’s advertising decreases the marginal productivity of Kevin’s advertising. With these demand functions and a predetermined price such that each sale brings in $10 in profit before advertising costs are subtracted, profits net of advertising are and

(

)

(13.3)

)

(13.4)

π K = 10 10 + 0.5aK − 0.25aM − 0.005 ( aK + α aM ) − aK , π M = 10 10 + 0.5aM − 0.25aK − 0.005 ( aM + α aK ) − aM .

Whatever his guess about Morgan’s advertising level, Kevin will choose his own advertising to maximize his profit. Taking the derivative of (13.3) gives:

dπ K = 10 ( 0.5 − 0.01( aK + α aM ) ) − 1 = 0 . daK

(13.5)

We will refer to the derivative of Kevin’s payoff with respect to his advertising as the net marginal benefit, NMB. If the NMB is positive (negative), Kevin should increase (decrease) advertising. At the maximum, the NMB is 0. This can readily be rearranged as follows:

5 − 0.1aK − 0.1α aM = 1 .

(13.6)

This sets the marginal benefit of an advertising message equal to marginal cost ($1). Solving equation (13.6) for Kevin’s advertising level gives Kevin’s reaction function, or, his best response function, RK(aM). The reaction function gives the level

239

of advertising that will maximize his profit for any level of Morgan’s advertising. Solving gives

aK = RK (aM ) = 40 − α aM .

(13.7)

If Morgan does not advertise, Kevin spends $40. The slope of a reaction function has important strategic implications. We can find the slope of the reaction function by dRK da , or K , is equal to ‐α. For every simply taking its derivative. In this case daM daM additional dollar Morgan spends, Kevin spends an additional –$α. If α is negative, Kevin’s reaction function slopes up. This is shown in the left panel of the figure below. If α is positive, Kevin’s reaction function slopes down. This is shown in the right panel of the figure below. aK

RK(aM)

RK(aM) aM

It is important to clearly understand what determines the slope of a reaction function. At Kevin’s profit maximizing output level for any level of Morgan’s advertising, NMBK is 0. That is, NMBK is zero at every point on Kevin’s reaction function. So, beginning from a point on Kevin’s reaction function, if Morgan’s advertising increases the NMB of Kevin’s advertising, when she advertises more Kevin’s NMB becomes positive. Then, to maximize profit he must advertise more, increasing advertising until once again NMBK is 0. Thus the reaction function slopes up. On the other hand, if Morgan’s advertising decreases the NMB of Kevin’s advertising, when she advertises more Kevin’s NMB becomes negative if we began at point on the reaction function (where NMBK is 0). To maximize profit he must advertise less, decreasing advertising until NMBK is 0. Thus the reaction function slopes down. We can find the impact of Morgan’s advertising on the NMB of Kevin’s advertising by taking the derivative of equation (13.5). Doing so gives

240

dNMBK = −10 ( 0.01α ) = −0.1α .6 daM

(13.8)

Since this derivative is positive, Kevin’s reaction function slopes up. If, however, Morgan’s advertising reduced the NMB of Kevin’s advertising, this derivative would be negative and his reaction function would slope down, not up. Taking the derivative of Morgan’s profit function and solving for her reaction function would work the same way. Since her profit function is basically the same as Kevin’s and the calculus and algebra works the same way, lets just skip to the reaction function:

aM = RM (aK ) = 40 − α aK .

(13.9)

Above, we defined a Nash Equilibrium technically as the point of intersection of two reaction functions. This is shown in the figure below, with aKne and aMne representing the Nash Equilibrium advertising levels of Kevin and Morgan, respectively. In the panel to the left, the solution is for each to spend more than $40 because each responds to the other’s advertising by advertising more. In the panel to the right, they each spend less than $40, because each responds to the other’s advertising by advertising less. aK

RM(aK)

RM(aK) RK(aM)

aKne

aMne

RK(aM) aMne 40

When the reaction functions slope up, the players’ strategies ‐ here how much to advertise ‐ are called strategic complements. This is the case in the left panel. When the reaction functions slope down, the player’s strategies are called strategic substitutes. This is the case in the right panel. To solve for the numeric value of advertising in the Nash Equilibrium, we need to specify a value for α. For purposes of an example, lets assume α=‐0.5. In that case, the reaction functions are

6 This is just the derivative with respect to Morgan’s advertising of the derivative of Kevin’s profit

with respect to Kevin’s advertising – that is it is the cross partial derivative of Kevin’s profit function,

∂ 2π K . ∂aK ∂aM

241

RK (aM ) = 40 + 0.5aM

(13.10)

RM (aK ) = 40 + 0.5aK

(13.11)

and

respectively. Suppose Kevin expected Morgan to spend $40. Then, he would want to spend $60 (40+0.5⋅40). But, if he spent $60, Morgan would want to spend $70, in which case Kevin would want to spend $75, and so on. If instead RM(aK) Kevin expected Morgan to spend aK RK(aM) $120, he would spend $100. But if he spent $100 Morgan would spend $90, in which case Kevin will spend 80 $85, and so on. If you continued with this process, it would converge to $80. The only advertising level 40 where both correctly anticipate their opponent’s play and react optimally is when both spend $80. This is 120 aM 40 80 illustrated in the figure at right. Equations (13.10) and (13.11) provide two equations in two unknowns. To find the solution algebraically instead of graphically, substitute from one into the other and solve. Substituting for aM in Kevin’s reaction function gives: aK = 40 + 0.5 ( 40 + 0.5aK )

= 60 + 0.25aK 0.75aK = 60

(13.12)

aK = 80

Substituting the solution for Kevin’s advertising into Morgan’s reaction function gives the solution for Morgan’s advertising,

aM = 40 + 0.5 ( 80 ) = 80 .

(13.13)

This game is symmetric. For our purposes, that simply means the only difference between the players is their names. If you take Kevin’s profit function and replaced the Ks with Ms and the Ms with Ks, you get Morgan’s profit function. When a game is symmetric, it will generally have a symmetric equilibrium, that is an equilibrium where the players play the same strategy.7 In a game where the reaction functions can only cross one time, like this one, we know therefore that there is only one solution and that it will involve both players playing identical strategies. Symmetry can be very helpful when analyzing and solving games. This one was easy to solve with simple substitution, but that is not always the case. However, 7 At least, this is true for the most part and for all the cases we will be interested in. There are some

technical issues involved in establishing this with complete generality.

242

symmetry means we know that in the solution aM=aK. Once we find the expression for NMB and equate it to 0, we can use this fact to solve immediately for the equilibrium. In this case, NMB=0 and aM=aK become two equations in two unknowns, with the added benefit that one is very simple which makes solving easy. For example, substituting ‐0.5 for α in equation (13.5) and applying symmetry gives

10 ( 0.5 − 0.01( a + 0.5a ) ) − 1 = 0 5 − 0.05a − 1 = 0 , 0.05a = 4 a = 80

(13.14)

where a, sans subscript, represents the equilibrium level of advertising common to both players. So, in the Nash Equilibrium in our example, each spends 80 on advertising. Plugging this in to the demand and profit functions, we find each sells 22 units and makes a profit of $140 (10⋅22‐80). If neither advertised, each would sell 10 units and make a profit of $100. In this example advertising results in higher profits. However, each firm’s advertising has a negative effect on the other’s profits. Therefore, if the two firms cooperated with one another, or colluded, profit would be higher. To see this, calculate what would happen if both advertising levels were chosen jointly, or cooperatively, to maximize total profit. Total profit is just the sum of the two firm’s profits.

πT = π K + π M

( ) +10 (10 + 0.5a − 0.25a − 0.005 ( a − 0.5a ) ) − a = 10 ( 20 + 0.25a + 0.25a − 0.005 ( a − 0.5a ) − 0.005 ( a

π T = 10 10 + 0.5aK − 0.25aM − 0.005 ( aK − 0.5aM ) − aK 2

(13.15)

)

− 0.5aK ) − aK − aM 2

To maximize, the partial derivatives, set them equal to zero, and solve. Taking the derivative with respect to advertising for Kevin gives: ∂π T = 10 ( 0.25 − 0.01( aK − 0.5aM ) − 0.01( aM − 0.5aK )( −0.5 ) ) − 1 = 0 . (13.16) ∂aK The derivative with respect to advertising for Morgan is the mirror image of the above derivative. In the solution to this cooperative problem, advertising will be the same for each firm, that is aK=aM, due to the symmetric way each firm’s advertising enters the problem. We can use that to simplify the solution process. Working from equation (13.16) and letting a represent the common advertising level gives:

2.5 − 0.1aK + 0.05aM + 0.05aM − 0.025aK − 1 = 0 0.025a = 1.5 . a = 60

243

(13.17)

If both spend 60, both sell 20.5 units and make a profit of $145. Yet, if either firm unilaterally tries to advertise at a level of $60, the other firm has an incentive to respond by spending $70 on advertising to maximize their own profit. So, the cooperative solution is not an equilibrium solution. Thus, there is an aspect of the prisoner’s dilemma in this advertising game. Suppose Kevin developed some cost advantage that means he makes $12 per unit sold instead of $10. How would the game change? First, it would no longer be symmetric, and we could not use that to simplify finding a solution. But, we can use our understanding of reaction functions to offer a general analysis of how the equilibrium changes without solving explicitly for the new equilibrium. Regardless of the level of Morgan’s advertising, the NMB of Kevin’s advertising increases. That means his reaction function will shift up – for every level of Morgan’s advertising, he spends more. This is shown in the figure below. RK0 and RM0 are the initial reaction functions, and aK0 and aM0 are the initial Nash Equilibrium values. RK1 represents Kevin’s reaction function after the decrease in his cost, and aK1 and aM1 are the new Nash Equilibrium values. In the left panel, advertising levels are strategic complements. When Kevin’s reaction function shifts up and he advertises more, Morgan responds by advertising more as well, which drives a further increase in Kevin’s advertising in the new equilibrium. So, both firms advertise more in the Nash equilibrium. The indirect, or strategic, effect of the change in Kevin’s cost is the increase in Morgan’s advertising. This strategic effect undoes some of the direct increase in Kevin’s profit due to the direct effect of his cost reduction. aK

aK RK

aK1 RK0

RM aK1 RK1

aK0

aK0 aM0

aM1

RK0 aM1

aM0

In the right panel, advertising levels are strategic substitutes. When Kevin’s reaction function shifts up and he advertises more, Morgan responds by advertising less, which induces a further increase in Kevin’s advertising, and so on. So, in the new Nash equilibrium Kevin advertises more and Morgan advertises less. The strategic effect of a decrease in Morgan’s advertising adds to the increase in Kevin’s profit due to the direct effect of his cost reduction. Kevin’s increase in advertising in effect crowds out some of Morgan’s.

244

*****It would be a useful exercise for the reader to calculate the Nash Equilibrium of the non symmetric game when Kevin makes $12 per customer and Morgan makes only $10.****

An Extended Example Continued – Advertising Sequentially Return to the original demand functions where we had not yet specified a value for α, and suppose Morgan moves first. Will she advertise more or less and how will Kevin respond? As usual in a sequential move game, we begin at the end. First, suppose Kevin’s reaction function slopes up. Then, if Morgan advertises more (less) in the first stage, Kevin sees her move and he responds by advertising more (less) as well in the second stage. Anticipating this at the first stage, Morgan should ask herself the following question ‐ “Do I want Kevin to advertise more or less?” Since Kevin’s advertising reduces Morgan’s profit, she wants him to advertise less. At the simultaneous play equilibrium, the NMB of Morgan’s advertising is 0. That means if she spends just a little more or a little less, it has no direct impact on her profit. It does have an important indirect effect, though. If she spends less, since she moves first, Kevin will see that and spend less in turn. This will increase Morgan’s profit. It will also increase Kevin’s. In fact, Kevin’s profit increases by more than Morgan’s when Morgan moves first. Why? Because he could obtain the same increase in profit by adopting the same advertising level as Morgan, but he need not do so. Instead, since he moves second, he is free to choose the advertising level that maximizes his profit, given Morgan has advertised less. If he makes a different choice, it must be because it results in greater profit. Thus, his increase in profit must be at least as big as Morgan’s, and is typically larger. Now suppose Kevin’s reaction function slopes down. If Morgan advertises more at the first move, Kevin will see that and respond by advertising less at the second move. Since Morgan wants Kevin to advertise less, she should advertise more. Again, at the simultaneous play equilibrium, the NMB of Morgan’s advertising is 0. That means if she spends just a little more, it has no direct impact on her profit. The indirect effect is important ‐ if she spends more at the first move Kevin will see that and spend less in turn at the second move. This will increase Morgan’s profit. Since Morgan advertises more, Kevin’s profit is lower. Thus, there is a clear first mover advantage when advertising levels are strategic substitutes. To work an example, lets return to the given demand functions and again assume α=‐0.5. Morgan can anticipate that Kevin’s reaction function is RK = 40 + 0.5aM . Knowing that Kevin will respond this way at the second move after observing her choice at the first move, she is free to, in effect, choose the point on his reaction function that she would prefer through her choice of advertising. So, we can simply substitute this reaction function into her profit function for aK and then maximize her profit. Her profit function becomes

(

)

π M = 10 10 + 0.5aM − 0.25 ( 40 + 0.5aM ) − 0.005 ( aM − 0.5 ( 40 + 0.5aM ) ) − aM . (13.18)

245

This simplifies to

(

)

π M = 10 0.375aM − 0.005 ( 0.75aM − 20 ) − aM . 2

(13.19)

Maximizing gives

dπ M = 10 ( 0.375 − 0.01( 0.75aM − 20 ) 0.75 ) − 1 = 0 . daM

(13.20)

Solving equation (13.20), we find Morgan spends 75.56 on advertising. This is less than the 80 spent in the simultaneous move game. That induces Kevin to spend less as well. From Kevin’s reaction function, we find he would spend 77.78 (40+0.5⋅75.56). Plugging those values in to find profit, we find moving first increased Morgan’s profit from 140 to 140.55. However, Kevin’s profit increased from 140 in the simultaneous play version to 142.22.

Continuous Strategies – General Analysis This section discusses games with continuous strategies in a very general way. It therefore appears more abstract and more technical than the previous extended example. BUT, the ideas and techniques are the same as those above. The main point is to show that the intuition and technique discussed above is perfectly general and can be correctly applied to a broad array of situations involving strategic interdependence. In fact, if you come away with a good intuitive understanding of the basic concepts, you will be in very good shape for most of the remaining chapters – much of which will consist of applying these ideas to the analysis of market and firm structure. Let’s first define some notation. The two players are A and B. They may be firms, individuals, or nations, as the situation warrants. Their strategies are represented by xA and xB. Strategies may be the choice of advertising, price, or quantity by a firm, the choice of how much to spend on national defense for a nation, or any other continuous variable that must be chosen in a strategically interdependent environment. We will let π represent the payoffs. These may be profits for firms, utility for individuals, or whatever is most appropriate in any particular case. The important thing is that each player’s payoff is a function of the choices of both players. So, player A’s payoff is denoted πA(xA,xB) and player B’s πB(xB,xA). As defined above, we will refer to the partial derivative of a player’s payoff with respect to their own strategy as the Net Marginal Benefit, NMB, of their strategy. That is, the rate at which their payoff increases with a small increase in their strategy choice. The NMB of each player’s strategy is a function of the level of both strategies ∂π ∂π ‐ that is A = NMBA ( x A , xB ) and B = NMBB ( xB , x A ) . ∂x A ∂xB Each player increases the level of their strategy as long as their NMB is positive. So, to find each players reaction functions, set their NMB equal to 0. Thus, the Nash Equilibrium is described by two conditions,

246

NMBA ( x A , xB ) = 0 and NMBB ( xB , x A ) = 0 .

(13.21)

Solving those two conditions gives the solutions for each choice in terms of the level of the other player’s strategy,

x AR ( xB ) and xBR ( x A ) or RA ( xB ) and RB ( x A ) .

(13.22)

While we are presenting the case of only 2 players, it works the same way with any number of players, n. The only difference is that there would be n NMBs to set equal to 0, which would give n reaction functions, which would provide n equations in n unknowns that could be solved for the solution to the game. How do we find the slope of the reactions functions? That is, what determines whether A will increase or decrease xA when they expect xB to be higher? We described the intuitive arguments above. If an increase in xB increases NMBA, player A responds by increasing xA, and the reaction function slopes up. If an increase in xB decreases NMBA, player A responds by decreasing xA, and the reaction function slopes down. Lets confirm this mathematically for A (the procedure would be the same for B). First, substitute the solution for A’s choice into the rule that NMB is 0 for A’s choice, so we write

NMBA ( x AR ( xB ) , xB ) = 0 .

(13.23)

Then, take the derivative of both sides of (13.23) with respect to xB:

∂NMBA dx AR ∂NMBA + = 0 . ∂x A dxB ∂xB

(13.24)

Intuitively, changes in xB have two effects on NMBA when xA has been chosen optimally. First, the direct effect is the second term in equation (13.24). In the advertising example, B’s advertising increased NMBA if α was negative. If firms were choosing quantities to sell and the goods are substitutes, if B sold more, it would decrease the NMB to A of selling more. Second is the indirect effect. In order to keep NMB equal to 0, xA must change to induce a change in NMBA to counter the effect of the change in xB on NMBA. This is the first term in equation (13.24). From the chain rule, the indirect effect is the derivative of NMBA with respect to xA times the derivative of x AR with respect to xB. Equation (13.24) can be rearranged as follows:

∂NMBA dx AR ∂NMBA =− ∂x A dxB ∂xB

(13.25)

dxAR ∂NMBA ∂xB . =− dxB ∂NMBA ∂x A

(13.26)

247

This says the slope of the reaction function is the opposite of the ratio of the derivative of NMBA with respect to B’s strategy to the derivative of NMBA with respect to A’s strategy. Diminishing marginal returns implies the effect of xA on NMB is negative. In fact, if it were positive, setting NMB equal to 0 would find a minimum (a valley) not a maximum (a peak) of A’s profit. The fact that the denominator of the right hand side of equation (13.26) is negative cancels the fact that we are looking for the negative of the ratio. Thus, the sign of the slope of the reaction function, dxA/dxB, is the same as the sign of ∂NMBA ∂xB . So, mathematically, if an increase in xB increases NMBA, A’s reaction function slopes up, and, vice versa. This is exactly as we argued intuitively and as we showed to be true in the advertising example. However, we have now established that it is true in general. This means we can use knowledge of the effect of one player’s strategy on the net marginal benefit of the other player’s strategy to determine the shape of the reaction functions generally. We can then use that knowledge to analyze the effects of outside factors on the equilibrium outcome. For example, the impact of a change in cost on the outcome of the advertising game above. The same type of analysis can be made whether the players are choosing prices, quantities, advertising, or, some other variable. We can also use this to determine how a game will change when one player moves first. Suppose player B moves first. Since B moves first, they can count on A responding optimally at the second stage. We can therefore substitute A’s reaction function into B’s payoff, to get πB(xB,xA(xB)). Maximizing this gives

dπ B ∂π B ∂π B dx A = + = 0 . dxB ∂xB ∂x A dxB

(13.27)

In equation (13.27), player B takes account of the direct effect of their choice on their product and also the indirect effect of their choice on their opponents ensuing choice and the effect of that choice on their profit. If an increase in xA decreases B’s payoff, then B wants to encourage A to select a lower level of xA. If the reaction functions slope up, B chooses a lower level of xB to induce A to choose a lower xA. This was the case with the advertising example. If the reaction functions slope down, B chooses a higher level of xB to induce A to choose a lower xA. For example, if x is a choice of quantity, B wants A to produce less, so, B would produce more when they move first. The other possibility is that an increase in xA increases B’s payoff, in which case B wants to encourage A to select a higher level of xA. If the reaction functions slope up, B chooses a higher level of xB to induce A to choose a higher xA. This is the case if x is price and the two players sell substitute goods. To get the second mover to charge a higher price, the first mover chooses a higher price. If the reaction functions slope down, B chooses a lower level of xB to induce A to choose a higher level of xA.

248

Chapter 14 Repeated Games In repeated games, at least one player plays the game more than once. One of the more important things about this type of game is that some responses that were not rational in a one‐shot game may be rational in a repeated game. Let’s look at a scenario where two firms are competing in an industry, and they each have the option to compete hard (low prices and high quantities) or soft (high prices and low quantities): David

Hard

Mike

Soft

Hard M: 0 D: 0 M: ‐10 D: 15

Soft M: 15 D: ‐10 M: 10 D: 10

Looking at the best responses (highlighted) we can see they intersect in the top‐ left cell. This is the standard one‐shot equilibrium, similar to the prisoner’s dilemma. However, if there were some way for Mike and David to cooperate, they could both make a payoff of 10 (bottom‐right cell) and be better off. Let’s suppose they both knew that they were going to play the game 10 times. Is it possible that the two might now have an incentive to cooperate by playing soft? That is, might each player try to earn and retain the good will of the opposing player by playing soft initially and then continuing to play soft as long as the other player does so? Ask yourself what would happen the 10th and final time the game is played. At this point, both players know it is the last round. There is no longer any reason to retain the other player’s good will, since there is no future. Rationally, each player would play hard in round 10. If they expect the other to play hard, this is protecting themselves. If they thought for some reason the other would play soft, they make more profit playing hard and there is no cost in terms of a loss of future good will. Each player should thus predict what the Nash Equilibrium play in the last round will be for both players to play Hard. Therefore, knowing this, there is no reason to bother maintaining good will in the next to last period either. Thus, they should both anticipate that equilibrium play in the next to last period is for both to play hard. Knowing that, there is no reason to try to cooperate in the second to last period, and, so on and so on. Cooperation unravels from the known final period all the way to the first period of the game. In a repeated game with an end period that is known for certainty, the Nash Equilibrium of the repeated game is just the Nash Equilibrium of the one shot game repeated again and again.

249

Infinitely or Indefinitely Repeated Games What is the end period is not known with certainty. There are two possibilities here. The first is that the game is literally repeated infinitely ‐ it simply never ends. The second possibility is that no one knows when the game will end. That is, there is some chance the game may end after any period. That probability may represent the chance a product becomes obsolete, or a player dies or retires, or that there is some major change in the structure of the market. We will focus on the latter type of game, in which there is some chance the game will end after any given round. In these types of games, there may be an incentive to maintain good will, since the players don’t know if the game will continue next period or not when choosing their strategies. We will let f represent the (constant) probability that the game ends after any play, and r represent the interest rate used to discount future payoffs to obtain their present value. Since the game could potentially continue forever, there are too many strategies to define individually. Therefore, we will focus on a couple of strategies that seem particularly reasonable in repeated games. Both are examples of trigger strategies – strategies where an observed action on the part of your opponent triggers a change in your play in future rounds.

• Tit for tat: In the first round, play cooperatively. Thereafter play whatever the opponent played in the previous round. For example, in the price competition game, if Mike were to play soft in period one, and then every round thereafter play what David played the round before, Mike would be playing a tit‐for‐tat strategy. In other words, Mike “pays back” David with whatever response David gave Mike in the previous play. An important variation is tit for tat with forgiveness, whereby non‐cooperative play is punished for a while, but then the player attempts to establish cooperation again. • Grim: In the first period, play cooperatively. Then, cooperate as long as your competitor cooperates, but as soon as he stops cooperating, you punish him every period thereafter. Think of it as rewarding cooperation forever, and punishing non‐cooperation forever. We will focus on this strategy because it gives the strongest possible incentives to cooperate. If Grim can’t induce your opponent to cooperate with you, nothing can. Let’s see if Grim is an equilibrium for our previous game if we assume r=0.1 and f=0.12. Note the payoffs for cooperating are 10 (the bottom‐right cell). David

Mike

Hard Soft

Hard M: 0 D: 0 M: ‐10 D: 15

250

Soft M: 15 D: ‐10 M: 10 D: 10

Assume that Mike plays grim. Is it a best response for David to play grim in return? If David plays grim, he will get 10 per round as long as the game goes on. His expected profit is:

E (π | grim ) = 10 +

(0.88) (0.88)2 10 + 10 + ... (1 + .1) (1 + .1) 2

The first 10 is from the first period and is not discounted. The second 10 he gets only if the game continues (with probability 1‐f = 1‐0.12 = 0.88), and if he does get the second 10 it has to be discounted back to its present value (dividing by 1+0.1). The third 10 only happens if the game continues in both periods (0.88)2 and then is discounted back two periods, etc. Thus, in general, the expected profit is t

∞ ∞ t ⎛ 1− f ⎞ ⎛ 0.88 ⎞ 10∑ ⎜ = 10 = 10 ( 0.8) . ∑ ∑ ⎟ ⎜ ⎟ t =0 ⎝ 1 + r ⎠ t = 0 ⎝ 1.1 ⎠ t =0 ∞

where t is the time period (round of play). ∞

To find the value of ∑ ( 0.8 ) , note that it is simply an infinite geometric series. t

t =0

So, we can make use of the following convenient formula (which we will not prove): ∞

∑a

1 . 1− a

(The familiar formula for the present value of a perpetuity with a periodic payment of A beginning at time 0 with interest rate r is just a special application of this result.) For our purpose, a=0.8, so: ∞

∑ ( 0.8) t =0

1 1 = = 5 . 1 − 0.8 0.2

So the expected payoff above, in which David plays grim in response to Mike’s play of grim boils down to t

⎛ 1− f ⎞ 10∑ ⎜ ⎟ = 10 ⋅ 5 = 50 . t =0 ⎝ 1 + r ⎠ ∞

Now let’s consider what would happen if David did not play grim. We will refer to this as “cheating”, since David is not cooperating when the other player is trying to. If David does not cooperate in the current round, he would play Hard. Matched against Mike’s play of Soft, David would earn a payoff of $15. However, thereafter Mike would play Hard, since his strategy is grim. David’s best play in all later rounds is Hard. Therefore, in all subsequent periods, David’s payoff will be 0. Then, David’s expected profit from the time he cheats going forward is

⎛ 1 − .12 ⎞ ⎛ 1 − .12 ⎞ E (π | cheat ) = 15 + ⎜ ⎟ 0+⎜ ⎟ 0+ ⎝ 1 + .1 ⎠ ⎝ 1 + .1 ⎠ 1

251

= 15 .

Since 50>15, David would rather play grim and “cooperate” each round by playing Soft until Mike fails to play Soft, rather than “cheat” at some point and play Hard. Since it is in David’s interest to play grim in response to Mike’s play of grim, we can conclude it would also be in Mike’s interest to play grim in response to David’s play of grim, because the game is symmetric. Thus, both players playing grim constitutes a Nash Equilibrium in the infinitely (or indefinitely) repeated game. Since playing grim is an equilibrium in this game (in other words, playing grim is a best response to playing grim), cooperation is possible. Note that this doesn’t mean that non‐cooperation isn’t also possible. Since non‐cooperation (both playing hard) was our original Nash equilibrium when viewing this as a one‐shot game, playing hard forever is also an equilibrium in a repeated game. The fact that it is repeated doesn’t eliminate the original equilibrium; it just adds new potential equilibria. In any repeated game, repeating the one‐shot equilibrium play each round is always an equilibrium. To generalize the above example, let’s define some notation:

• πCOOP: this is one‐period payoff when both players cooperate • πNE: this is the payoff from the one‐shot Nash equilibrium of the game • πCHEAT: this is the one‐period payoff when one player exploits the other; note, the “non‐cheating” player must be cooperating in expectation that the “cheating” player will also cooperate Assume there are two players, A and B. Assume one player plays grim. Is it a best response for the other to play grim as well? The expected payoff of grim against grim is:

⎛ 1− f E (π | grim) = πCOOP + ⎜ ⎝ 1+ r

⎞ ⎛ 1− f ⎟ πCOOP + ⎜ ⎠ ⎝ 1+ r

⎛ 1− f ⎞ E (π | grim) = πCOOP ∑ ⎜ ⎟ t =0 ⎝ 1 + r ⎠ ∞

⎞ ⎟ πCOOP + ⎠

Using the formula for the value of an infinite geometric series, we can simplify the above summation. For our purpose, first note

⎛ 1− f a=⎜ ⎝ 1+ r Then,

1 = 1− a

⎞ ⎟ . ⎠

1 1 1 1+ r = = = . ⎛ 1 − f ⎞ ⎛ 1 + r 1− f ⎞ ⎛ 1 + r −1+ f ⎞ r + f 1− ⎜ − ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ 1+ r ⎠ ⎝ 1+ r 1+ r ⎠ ⎝ 1+ r ⎠

So, to convert a flow of uncertain future payments starting immediately to its 1+ r present value, multiply by . Suppose instead the payment starts one period r+ f

252

from now (either at the end of the current period or the beginning of the next), we simply subtract one from this, to capture the fact that there is no current payment. 1+ r 1 + r − (r + f ) 1 − f −1 = = That yields . So, to convert a flow of uncertain future r+ f r+ f r+ f 1− f payments starting in one period to its present value, multiply by . Note that if r+ f f=0, this gives the well known multiplier to find the value of a perpetuity, 1/r. Returning to the problem at hand, we have:

⎛ 1+ r E (π | grim) = πCOOP ⎜ ⎝r+ f

⎞ ⎟ . ⎠

This expression is just the expected present value from playing grim against grim and cooperating forever. It will be soon be useful to not that we could also express this as:

⎛ 1− f E (π | grim) = πCOOP + πCOOP ⎜ ⎝r+ f

⎞ ⎟ . ⎠

This latter expression simply breaks the expected present value into the initial payoff plus the expected present value of the uncertain perpetuity beginning in one period. Now let’s look at the expected value from cheating, evaluated from the period at which the player cheats going forward. Remember, the player that cheats gets one period of πCHEAT followed by the one‐shot Nash equilibrium forever when playing against grim.

⎛ 1− f E (π | cheat ) = πCHEAT + ⎜ ⎝ 1+ r

⎞ ⎛ 1− f ⎞ ⎟ π NE + ⎜ ⎟ π NE + ⎠ ⎝ 1+ r ⎠ 2

The payoff is the profit of “cheating” plus the expected present value of receiving the one shot Nash equilibrium profit beginning one period out and continuing until the game ends. This is just: t

∞ ⎛ 1− f ⎞ E (π | cheat ) = πCHEAT + π NE ∑ ⎜ ⎟ . t =1 ⎝ 1 + r ⎠

Making use of the fact that the expected present value of an uncertain perpetuity 1− f beginning in one period is found by multiplying by , this can be written as: r+ f

⎛ 1− f E (π | cheat ) = πCHEAT + ⎜ ⎝r+ f

253

⎞ ⎟ π NE . ⎠

Cooperation in a repeated game is possible if the expected present value of playing grim in response to grim exceeds the expected present value of “cheating”, that is if E(grim) > E(cheat). From the work above, this is:

⎛ 1− f ⎞ ⎛ 1− f πCOOP + πCOOP ⎜ ⎟ ≥ πCHEAT + ⎜ ⎝r+ f ⎠ ⎝r+ f

⎞ ⎟ π NE . ⎠

Rearranging this equation, we get

⎛ 1− f (πCHEAT − πCOOP ) ≤ ⎜ ⎝r+ f

⎞ ⎟ (πCOOP − π NE ) . ⎠

The left hand side is the current gain from cheating instead of cooperating. The cost of cheating this period is that in all future periods a cheater will get the Nash equilibrium profit, not the cooperative profit. So, the right side is the expected present value of the future profits that are given up by a cheater. So, this inequality just says that if the expected present value of the future profits lost if the player cheats at the current play exceeds the current gain from cheating instead of cooperating, cooperation is a Nash equilibrium. Since we’re saying the left‐hand side must be larger for cooperation to be in the player’s self interest, several important conclusions follow from the previous paragraph. If πCOOP increases, cooperation will be more likely (note (1‐f)/(r+f) is less than one). Conversely, if πCHEAT increases, cooperation will be less likely. If r, the interest rate, increases, that means we’re valuing present profits higher relative to future profits, making cheating more attractive. If f, the probability that the game ends, increases, the future is less valuable, and again it will be harder to cooperate. An increase in the number of players makes cooperation more difficult to sustain for two reasons. First, the gains to cooperating are likely to fall with the number of players. If a cartel makes all the firms act collectively as a monopoly, the profit each π firm gets individually is (1/n)th of the monopoly profit, MONOPOLY . Thus, as n n increases, each firm’s “cooperating” profit decreases, which decreases the incentive to cooperate. Second, cooperation hangs on the promise of reward versus threat of punishment. In the real world, punishment can be much more difficult because it can be hard to detect cheating. If we let c be the cost of monitoring one firm, and n be the number of players, than each player has to monitor (n‐1) firms (since they don’t have to monitor themselves) to make sure no one is “cheating.” Thus, an individual’s monitoring costs are c(n‐1). So, monitoring costs are proportional to n‐1 while the gain to cooperating will tend to be inversely proportional to n. Individually, the gain to cooperating thus falls for two reasons. From the cartel point of view, The total cost of monitoring is c(n‐1)⋅n since there are n firms each incurring costs of c(n‐1). So, total monitoring costs increase with the square of n, and thus can seriously erode the total gain to cooperation in a large cartel. If information were free and perfect, monitoring would not be needed. If any player’s profit deviated from what they expected based on the “cooperative”

254

agreement, they would know someone “cheated” on the agreement. However, monitoring and punishing players can be complicated due to the presence of noise. Some things happen in the real world due to random events we have not explicitly included in the model. The triggers for punishments in cooperative strategies must be observable signals that everyone agrees on. But, noise means that the signal may be received sometimes when cheating occurs, and, it may be received other times when cheating did not occur. In fact, if the cooperative agreement (whether implicit or explicit, like a cartel) represents a cooperative Nash equilibrium, then it is in no one’s interest to cheat. In that case, when a signal is received that indicates cheating, the members should KNOW that no one cheated. But, the signal still triggers punishment, even though everyone knows they are punishing even though no one cheated. So, maybe it would make sense not to pull the trigger on punishment when the agreed upon signal says to do so? While the reasoning may seem subtle, doing that would entirely undermine cooperation. If all players believed that everyone believed no one would cheat, and thus would not punish, it becomes in everyone’s interest to cheat, and, cooperation totally breaks down. For cooperation to work, there has to be an “enforcer” in the group who everyone believes will pull the trigger when the agreed upon signal arrives, whether or not anyone actually believes anyone cheated. Otherwise, everyone cheats all the time. An example: Suppose OPEC (the oil cartel – the Organization of Petroleum Exporting Companies) agrees on a new production quota, and expects crude prices to average $95 per barrel. Suppose, then, they actually observe average prices of $75 per barrel. Does that mean that some members are “cheating” by producing over their quota and thus driving prices down? Or, does it mean that supplies from other producing nations are unexpectedly high for some unknown reason, or, that world oil demand is unexpectedly low? It may not be possible to know the answer for certain. Further, to the extent the answer can be determined, it may take a long time. By then, if it is due to “cheaters,” they have had a long time to cheat. But, if the members start punishing one another, they may be doing so only because demand was lower than expected, not because anyone cheated. The “enforcer” in OPEC is Saudi Arabia. They are the enforcer due to their large excess capacity. They maintain discipline in OPEC because they have the power to flood the market and reduce prices for everyone. That threat is useless unless they are willing to punish everyone even if they are not completely sure how much any particular member cheated, and, how much the observed price is due to factors outside every members control. Noise means it is a good idea to build some “forgiveness” into trigger strategies. Perhaps, if everyone cooperates, the average price is expected to be $95. But, since there is not certainty, perhaps punishment is only triggered if average price falls below $80. Then, after some period of punishment, the strategy should allow for the establishment of new targets. The occasional delivery of punishment (for random reasons in equilibrium) ensures that no one ever cheats. The lower target for the trigger reduces the frequency of the punishment, and, the fact that the punishment is limited in duration allows the reestablishment of cooperation. But, all of these

255

things make it harder to punish cheaters, and, because punishment is somewhat random, reduce the gains to cooperation. So, noise makes cooperation harder to sustain. Noise is also related to monitoring costs, in that the noisier the environment the more expensive it may be to collect useful (but imperfect) signals on the behavior of other players. In summary, cooperation is more likely where there is less noise, fewer players, less probability of the game ending, less profit from cheating, and lower interest rates. To hold a cartel together, you need all of those things favorable; otherwise, the gains from cheating become too large.

Repeated Games with Reputation Effects We are now going to complicate a typical game by considering reputation as a factor. Reputation is just what it sounds like – an intangible benefit that influences players’ behavior. Up to this point, we’ve assumed all players act rationally with respect to their payoffs; in other words, players maximize their expected payoff in a given situation. However, some players may care about other things, for example simply being honest. So, perhaps there are players out there that will cooperate if they say they will cooperate. Or, who will enter a market if they say they will. They do these things regardless of whether or not it appears to be in their self‐interest at the time. Cultural and social references to such behavior are common, for example, the following two.

“Recompense injury with justice, and recompense kindness with kindness” (Confucious).

“An honest man's word is as good as his bond” (John Ray's English Proverbs, 1670).

We will call players that play in ways like this “crazy.” We don’t mean “crazy” in the usual sense. Only in that they do not strictly maximize their payoffs within the rules of the game as it is written. They may simply care about things we have not included in the model. That makes them behave in ways our model would not predict. The presence of a small number of such “crazy” players can completely change the way a game is played. A player that everyone thinks is “crazy” in this way can change the way others play, because they do not expect the “crazy” player to be rational. Therefore, players who are not crazy may none the less seek to gain a reputation as “crazy” to change the way other players play against them. Lets look at a very simple example of this first. Consider the following game in which two firms engage in price competition. Each strategy is just how competitive the firms price their products, with hard being pricing below cost in an attempt to undercut their competitor, soft being something like monopoly prices, and, medium somewhere in between.

256

Soft

Player B Medium Hard

Soft

A: 10 B: 10

A: ‐5 B: 15

A: ‐10 B: 0

Medium

A: 15 B: ‐5

A: 0 B: 0

A: ‐5 B: ‐5

Hard

A: 0 B: ‐10

A: ‐5 B: ‐5

A: ‐10 B: ‐10

Player A

Looking at their best responses (highlighted), we see both players have a dominant strategy to play medium in a one shot game, and thus the one shot Nash equilibrium is the middle cell. Let’s assume that this is a repeated game with a known end period ‐ it ends in time period T. With only rational players, unraveling means cooperation is not possible at all with the known end period. However, let us allow for the possibility of “crazy” players in this game. Suppose there is a probability of f of a crazy player. For purposes of this game assume a “crazy” player plays soft in the first period and continues to do so as long as their competitor has played soft in previous rounds. But, if the competitor plays medium or hard in one period, the “crazy” player plays hard the following period to punish the competitor. After the period in which the punishment is delivered, the “crazy” player will play medium forever. Can this change the way a rational player will act? Might it be rational for a player that is not crazy to mimic the behavior of a crazy player, at least for a while? Looking at the end of the game in time period T, all players know the game will be over. Thus, a rational player will play their strongly dominant strategy of medium, since even if their competitor were crazy, there isn’t time left for the crazy player to punish them. What about period T‐1, one period before the final period of the game? Might it be in their interest to play soft in period T‐1? Or, are they sure to play medium, in which case the cooperation will unravel all the way back to the beginning? Consider the following strategy on the part of sane players – in all periods but the last, play soft until an opponent plays medium, after which play medium, and, in the last period, play medium. Suppose player A thinks B is playing that strategy if they are sane, but, there is a chance, f, that B is crazy. Is it in A’s best interest to adopt the same strategy as B? Or, will they just play their dominant strategy? Lets look at period T‐1. If A plays soft, he gets 10 in period T‐1. Then, in period T, if B is crazy, B plays soft, A plays Medium, and, A earns a payoff of 15. But, if B is not crazy, they both play medium and both earn 0. Evaluated at period T‐1, the expected present value of playing soft in period T‐1 then medium in T, EA(πA|S,M), is then

E (π A | S , M ) = 10 +

f 1− f f 15 + 0 = 10 + 15 . 1+ r 1+ r 1+ r

257

The alternative would be to play medium at T‐1. Then, A makes 15 in period T. If B is sane, both play medium in period T, and, A makes 0. If B is crazy, B plays hard in period T and A makes ‐5. The expected payoff, evaluated at period T‐1, of playing medium in both remaining periods, EA(πA|M,M), is

E(π A | M, M) = 15 −

f f 1− f 5+ 0 = 15 − 5 . 1+ r 1+ r 1+ r

Now that we have the expected profit for both of A’s strategies, we want to know when the first is more profitable than the second. That is, when:

E(π A | S, M) ≥ E(π A | M, M) f f 10 + 15 ≥15 − 5 1+ r 1+ r . f 20 ≥ 5 1+ r 1+ r f ≥ 4

If the interest rate is low, f need be only slightly larger than 0.25 for it to be worth playing soft in period T‐1 to maintain the possibility of the good will of a crazy opponent in period T. Similar reasoning means it is worthwhile to play soft in all earlier periods, too. The (S,M) strategy essentially boils down to player A acting “crazy” up to time period T, the last time period. There are several important things to understand about the previous example. First, we are not saying it is in player A’s best interest to actually be “crazy.” A “crazy” player will always play soft if their competitor did, even in the final round. We know that it is never rational to play soft in the final round; so what player A is considering is not actually whether to become “crazy,” but whether it is worth acting as if she were “crazy” to take advantage of the value of that reputation. Further, actually being “crazy” is not a choice. Rational players may choose to mimic crazy ones, at least for a while, to develop a valuable reputation as a cooperator that allows them to make $10 every period before the last instead of 0, and, then to have a chance at making 15 the last period. Another conclusion that follows from this is a reputation only has value as long as there is time left in the game to make use of it. The closer to the end of the game you are, the less valuable a reputation becomes. Finally, a rational player that has gained a reputation will want to “milk” that reputation in the last period. This means that even though it may be rational for player A to act “crazy” in all periods up to period t, she will “milk” her reputation by playing medium in the final period, period t. This is simply because it is no longer beneficial to maintain her reputation once the game has ended. Now that we have considered the case where a reputation as a cooperator may be beneficial, let’s look at the case where a reputation as a fighter may be beneficial. For this situation, we will use an entry game as an example, but, allow the entrant to

258

move first. The entrant can enter or not, then, the incumbent has the option to fight or accommodate the entrant. The game tree is shown below:

Enter

Entrant

Incumbent

E: 10, I: 20

Fight

Accommodate

E: 0, I: 40

E: ‐10, I: 10

Using the red lines to eliminate irrational responses, we see that the sub‐game perfect Nash equilibrium is that the entrant enters, and the incumbent accommodates. Now, just like when we were considering cooperation in the previous game, we want to know whether or not it is rational for the incumbent to act as a fighter. Suppose there is some chance, f, that an entrant may encounter a crazy incumbent who always fights. Will a rational incumbent want to mimic the crazy one, and, will that deter the entrant? Let’s look at the game again, adding the value of that reputation:

Entrant

Enter

Incumbent

Accommodate

E: 10, I: 20 + VA

Fight

No E: 0, I: 40+VN

E: ‐10, I: 10 + VF

Notice the addition of the variables to the incumbent’s payoffs. VF is added to the incumbent’s payoff when he fights. This signifies the expected present value of future payoffs if the incumbent fights; think of it as the future profits he earns by keeping other competitors out of the industry. VA is added to the incumbent’s payoff when he accommodates. Similarly, it’s the expected present value of future payoffs if the incumbent accommodates. Finally, VN is the expected present value of future profits if the incumbent neither fights nor accommodates because the entrant does not enter. Before we analyze this game, it’s important to note that the incumbent is playing this game many times, while the entrant is only playing it once. This is where the incumbent’s reputation comes in to play; he is trying to value the reputation of acting like a fighter versus acting like an accommodator. He will face many entrants, and what he does now will impact how they act in the future. Also, you can look at fighting in this game similar to acting “crazy” in the last, since fighting isn’t the SPNE. The incumbent is deciding whether or not to act “crazy,” not whether or not to actually become “crazy.”

259

Now let’s ask a question. When is it rational for the incumbent to fight (i.e. act “crazy”)? He fights when his expected payoff from fighting is greater than his expected payoff from accommodating. Looking at his payoffs, we see E(π|fight) > E(π|accom) 10 + VF > 20 + VA VF – VA > 10 So when the difference between the value of a reputation as a fighter and the value of a reputation as an accommodator is greater than 10, the incumbent would rather fight. We’re not going to solve this game explicitly, like we did the last. But we can make some general observations. Imagine that if the incumbent fights off the first entrant, and that that is enough to keep all future entrants from entering. What would VF be? Well, the period after the incumbent fights, no more entrants will ever enter again. Thus, he will earn a payoff of 40 forever. Taking into account discounting, the present value of that 40 is (40/r). Since that is what he earns due to him fighting in the first round, VF = (40/r). Similarly, imagine if the incumbent accommodates the first entrant, and that that is enough to show all future entrants that he is an accommodator. What would VA be? After the first period, future entrants will always enter, since they know the incumbent will always accommodate. Since he earns a payoff of 20 for accommodating, that payoff forever is just (20/r). Since that is what he earns as a result of accommodating in the first round, VA = (20/r). So the conclusion we can make about our above inequality, VF – VA > 10, is that if the incumbent plays the game a long time, the left hand side of the inequality, VF – VA, could approach 40 − 20 20 , which clearly exceeds 10. = r r Let’s now define g to be the probability that a “sane” incumbent fights; remember that the incumbent is just acting like a fighter in order to secure that reputation. We already solved the inequality that tells us when it is worthwhile to fight, so g is just the probability that it holds, or Pr(VF – VA > 10) Now let’s look at the game from the entrant’s perspective. When the entrant decides upon entering or not entering, he ultimately wants to predict whether or not the incumbent will fight him. If he does, the entrant will stay out, since his payoff would be ‐10. The entrant may be facing one of three types of incumbents ‐ a crazy incumbent, a sane one that fights anyway, or, a sane one that will accommodate. The total probability of a fight if the incumbent enters is f+g. The entrant stays out if E(π|Enter) < 0 (1‐(f+g))*10 – (f+g)*10 < 0 10 < 20(f+g) 0.5 < (f+g)

260

which says that if the total chance that the incumbent will fight is greater than 50%, the entrant will stay out. The following observations are important about reputations in repeated games:

• It is possible for g and VF – VA to be high, even if f is low, if r is low and the incumbent will play a long time. The intuitive interpretation of this is that an incumbent that is playing a long time, who may not be crazy (low f), but who values the future a lot (low r), will still fight to secure a reputation as a fighter to keep entrants out. • A reputation is not valuable if it is “cheap” in the sense that it is easy to come by. Remember, to gain the reputation as a fighter, the incumbent had to fight occasionally, which cost him payoffs in the period. If the forgone payoffs were really low, however, the entrant would know that the incumbent could get back that reputation at any time, and therefore that the incumbent will not be willing to sacrifice much to keep it. • The value of a reputation drops as “retirement” nears. The inherent value of a reputation is in the fact that it secures future profits. As the remaining time periods of the game dwindle down, the reputation won’t be as valuable to the incumbent, and thus it will be less likely that he will try to maintain it.

261

Chapter 14 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Unraveling The process of starting at the end of a finite game to see how players will act. Because players know when the final stage of the game will occur, they will have no incentive to cooperate in that stage, and thus the penultimate stage, and so on until the first round. Probability of Obsolescence The probability that the game will end after any play. As this probability increases, cooperation becomes less likely to occur. Trigger Strategy Strategy in which a player acts a certain way until the opponent acts differently, at which point the player changes his behavior. Tit for Tat Strategy Trigger strategy in which a player does anything his opponent did in the previous period. Grim Trigger Strategy Strategy in which the players play cooperatively in the first period, but as soon as one player stops cooperating, the other punishes him every period thereafter. Monitoring Cost The cost of enforcing and punishing players for not cooperating. As this cost increases, cooperation becomes less likely to occur. Noise The sheer randomness in the real world that cannot be modeled but can affect outcomes. As the level of noise increases, cooperation becomes less likely to occur. Cartel A cooperative and informal agreement between two or more firms, usually in an oligopolistic industry. Reputation An intangible benefit that influences other players’ behavior. Certain players may forgo the highest payoff today in order to secure a reputation for the future. Crazy Player A player who is not maximizing his current possible payoff. This could be because of the value of having a reputation.

262

Part 5 Product Market Structure, Strategy, and Analysis

263

Chapter 15 Homogenous Product Markets When talking about market structure, firms usually fall somewhere along a spectrum that describes how competitive their industry is. At one end is a monopoly, which consists of a single firm that faces no competition, and has complete control over the market price. At the other end is perfect competition, in which there are many firms, and each individual firm has no control over price levels; they are price takers, as described earlier in the course. Firms that are engaging in perfect competition have no strategic decisions to make, since they take their price from the market; nothing they do will have any significant impact on the other firms in the market. An oligopoly falls somewhere in the middle of the spectrum. An industry that is an oligopoly consists of at least two firms, each having significant market share. Since there are a relatively small number of large players, the managerial decisions that each firm makes have a significant impact on the demand that other firms experience. It is a market in which game theory can be used to help predict how firms will act, since all of the firms are strategically connected. Before talking more about an MC p oligopoly, let’s illustrate a typical monopoly on a graph. Demand is D pMON AC and marginal revenue is MR. Remember, MR<p since you have to π lower price to sell one more unit, so D the MR curve will be less then the D ACMON curve. The red lines are marginal MR cost (MC) and average cost (AC). MC Q QMON must cross AC where the AC line is at a minimum, because a higher marginal cost will pull average cost up, and a lower marginal cost will pull average cost down. A monopoly will maximize profit where MR=MC, shown by pMON and QMON. The profit is just total revenue minus total cost. The AC curve tells us the average cost of each unit, so the profit is just the area above ACMON and below PMON, shown by the blue box in the graph. In the free market system, when a firm is making profit, other firms are bound to enter. This will happen unless there are barriers to entry. These could be legal barriers, such as regulations or patents; or they could be due to sheer economies of scale, such as one firm always being able to produce a given output more cheaply than two firms could. Unless there are barriers to entry, profits will attract entrants. (Note: monopoly rights don’t necessarily guarantee profits if, for example, you had monopoly rights to produce VHS tapes or some other obsolete product).

264

Bertrand (Price) Competition with Homogenous Products Now let’s imagine we have two players, A and B, that are engaging in price competition. Assume these two firms make and sell homogenous products; that is, assume that they are selling the same thing, and the only difference is the price they charge. Therefore, the customer base is indifferent between their products and will just buy from whichever firm has the lower price. Next, assume both players announce their price simultaneously, and the firm with the lowest price meets whatever level of demand they encounter at that price. If they announce the same price, they split the market evenly. Finally, assume c is the constant per unit cost for both firms, and ε is the smallest increment by which price can be changed. To determine the Nash equilibrium of this game, let’s look at player A’s best responses. (Since the game is symmetric, these mirror player B’s best responses.) If player B sets a price of pB, as long as pB>c+ε, player A will set a price of pA=pB‐ε in order to undercut player B and get the whole market. If pB=c+ε, player A won’t want to undercut player B anymore, since doing so would cause pA=c, which means player A is earning just enough to cover costs and thus not earning profit. Therefore, if pB=c+ε, player A will set the same price at pA=c+ε, and they will earn a profit of ε per unit, and split the market evenly. If pB=c, player A doesn’t really care what price it sets, since setting it at c would mean 0 profit, and setting it above would not get any market share. Just as when we were looking at game theory tables, we want to know where the intersections of best responses are. We can see the only two points where their responses intersect are either pricing at cost, or pricing just slightly above cost. Since each firm has the incentive to lower price and capture the entire market, their prices will be driven down to the competitive level (unit cost or unit cost plus ε) and neither firm will make any significant profit. This result also has applicability to a monopolist producing a highly durable good. Imagine for simplicity that the good lasts forever, such as diamonds, that these goods never loose any value, and, that the inflation adjusted interest rate is near 0. Suppose consumers expect that the price charged today will be the same as the price charged tomorrow. Once the monopolist sets price, everyone willing to buy at that price buys. When tomorrow rolls around, if the monopolist wants to sell more, he will have to lower price. The next day he would have to lower price again, and, so on, until he was pricing at cost. But the customers today would be able to predict that, and since it’s the same product tomorrow and the next day, they would just wait. Thus, today’s durable goods monopolist faces price competition from his future self, and, that competition forces him to sell his product near cost today. In the previous example, we saw how price competition can drive a two‐firm industry to sell their products at or near cost (or even a durable goods monopolist); however, we don’t readily observe this in the real world. This is because we had an underlying assumption present in the last two games that typically does not exist.

265

Referring to the first game, we saw that the Nash equilibrium was for each firm to produce half of the industry quantity, or Q/2. We also saw that player A had the incentive to undercut player B (when pB>c+ε) in order to capture the entire market. However, if player A were to capture the entire market, she would need to be able to produce Q units, that means A’s capacity must be Q units. Similarly, for B to undercut A, their capacity must be Q units, too. Capacity is expensive, and it takes lots of labor and time to get factories and workers in place. Why would either player ever expend resources to build Q units of capacity when they expect to produce only Q/2 units in the Nash equilibrium? Surely the managers would be fired for wasteful spending if they built plants twice as large as needed. Therefore, the players will not always be able to undercut one another, for lack of capacity. This is true in general; price competition doesn’t make sense when dealing with homogenous (or durable) products and when the firms must also choose their capacity.

Simultaneous Homogenous Product Price Competition with Capacity Choice So, to reasonably model homogenous product oligopoly, we have to include the choice of capacity. Remember we said that the only reason a firm would engage in price competition with another firm in this industry is if it had excess capacity to do so; but the Nash equilibrium of that game was each firm splitting the market. Therefore, no rational player would have excess capacity, since it is costly. We will now see that adding capacity limits changes the strategic decisions in this game. Let’s assume an inverse demand curve of p(Q), which just describes price as a function of market quantity. Let’s let Q be industry quantity, and, allowing for two firms A and B, let Q = qA + qB where qA, qB are player A’s and B’s quantities, respectively. We will model price competition in a homogenous product market as a two‐ stage simultaneous move game. In the last stage, each firm announces a price and sells what is demanded at that price, up to its capacity. In the first stage, each firm chooses its capacity. Capacity cost is constant at k per unit and operating cost is constant at c per unit. Letting q represent capacity, A’s capacity is q A and B’s is qB . In choosing capacity, neither firm will have any incentive in this game to build significant excess capacity. So, they both expect that total sales in the second period will be the sum of the two firm’s capacities, or

Q = q A + qB . Thus, looking ahead from the first stage, they expect that market price will be p = p ( q A + qB ) .

Now, let’s look at player A’s profit function.

π = pq – C(q).

266

Capacity cost is k qA and with no excess capacity, operating cost is c qA . Substituting p = p ( q A + qB ) for price, this becomes

πA = p( qA + qB ) qA ‐ k qA ‐ c qA .

Notice this is the same thing as our normal profit function; the price is the market price, which is based on qA + qB ; the quantity that A sells is qA ; and player A incurs two costs, capacity costs of k per unit (which are just the actual costs of getting the plants up and running) and operating costs of c per unit (which are the costs of making each unit). This is analyzed just like any game with continuous strategies, as described in chapter 10. Player A will choose qA based on her best guess about qB . So, the game now boils down to the simultaneous choices of capacities. Since capacity will ultimately equal quantity sold, it is a game in which quantities are simultaneously chosen, and, then ultimately the price is whatever level clears the market. This type of game is known as a Cournot game. So, simultaneous homogenous product price (Bertrand) competition becomes simultaneous homogenous product quantity (Cournot) competition when the initial choice of capacity is modeled.

Simultaneous Quantity (Cournot) Competition To generalize, we know the amount firm A will produce implicitly depends on capacity, so instead of writing qA , we can simply say that firm A will produce qA units, and it will be understood that this was based on her choice of qA . Also, we can define player A’s total cost in the form of a cost function C(qA). Rewriting her profit function with these generalizations, it becomes

πA = p(qA+qB)qA – C(qA)

which says the price determined in the second round is just the price that clears the market given total available quantity (capacity). Let’s look at maximizing this graphically. Market demand is shown by p(Q). Since A will maximize profit based on their best guess about what B will produce, the residual demand that she faces will be the industry demand minus the amount B produces (qB), shown by the

MCA

p(Q) Residual Demand MRA qA

267

double arrow. Since A will produce where MR=MC, qA is where A’s MR curve intersects her MC. We can see given the information above, this problem looks like any other profit maximization question. The problem becomes a bit more complicated when we analyze how exactly player A makes her best guess about the quantity that B will produce. Let’s look at A’s profit function again, and set the derivative equal to 0:

πA = p(qA + qB)qA – C(qA)

dπ A dC ∂p = p+ q A − A = 0 . (Using the product rule to get marginal revenue.) ∂q A dq A dq A Notice that the term (dC/dqA) is nothing more than A’s marginal cost, or MCA. Now, when we used the product rule to calculate A’s marginal revenue (the first two terms of the derivative) we said the derivative of p(qA + qB) was (dp/dqA); we can now rewrite this as

∂p dp ∂Q = , ∂q A dQ ∂q A

since the right side says first how price changes with respect to industry quantity (dp/dQ), and then how industry quantity changes with respect to A’s quantity (∂Q/∂qA). This is the same as writing it as how price changes with respect to A’s quantity (∂p/∂qA) as it is on the left; it’s just splitting it up into two different derivatives. Splitting it up allows us to simplify it, however, because we know Q=qA+qB which means

∂Q = 1 . ∂q A

So the equation becomes

dπ A dC ∂p = p+ qA − A = 0 ∂q A dq A dq A

dπ A dp q A − MC A = 0 = p+ dq A dQ

dp q A = MC A . dQ

Now, let’s let sA be A’s market share. This is just the share of the total industry quantity that qA is; in other words, sA = (qA/Q). Solving for qA we get qA=sAQ and plugging this in to the above equation we get

dp Qs A = MC A . dQ

The reason we’ve gone through this mathematical manipulation is to express a point regarding competition. In the above equation, the left side is marginal

268

revenue, and the right side is marginal cost, which we know are equal if a firm is maximizing profit. Now consider a firm in a monopoly. What is its market share? 100%, or sMON=1. So the marginal revenue of a monopolist is

MRMON = p +

dp Q . dQ

We also know a monopolist has to lower price in order to sell another unit. This implies (dp/dQ) is negative, and it follows that MR<p. The marginal revenue of firm A in competition with other firms is

MRA = p +

dp Qs A dQ

and we can see that as the market share of A goes down (i.e. more firms are in the industry), sA goes down. For firms in perfect competition, sA approaches 0, and MR approaches price. For firms in an oligopoly, sA will be somewhere between 0 and 1. Looking back at the solution to player A’s maximization problem

dp q A = MC A dQ

remember that market demand p(Q) depends not only on qA but also on qB; thus, when we solve this equation, we will get qA as a function of qB. This is because player A is maximizing her profit given a “best guess” about player B’s quantity. From our section on game theory, we described a set of best responses as a reaction function, and this is exactly the same thing. So, solving the above will give us A’s reaction function which depends on qB, or

qA = RA(qB)

and doing the same for player B will give us

qB = RB(qA)

which is just a function telling player B what to produce for a given “guess” of qA. Notice the only difference between these maximization problems and the previous ones is that there is now an unknown present that each player has to make assumptions about. The theory of setting marginal revenue equal to marginal cost still holds, as it always will; it’s just that what marginal revenue exactly is has become slightly more complicated.

269

Example

Let’s look at a numerical example, and see how this uncertain variable affects a firm’s profits. Suppose cost/unit is $5, and let market demand be P(Q)=20 ‐ .25Q. The profit of a monopolist is

π MON = (20 − .25Q)Q − 5Q

MR = 20 −

Q = 15 ⇒ Q = 30 2

P = 20 − .25(30) ⇒ P = 12.50

π MON = 30(12.50 − 5) = 225

Q = 5 = MC 2

(taking the derivative and setting equal to 0)

The profit of a firm in oligopoly (Cournot) is

π A = (20 − .25(qA + qB ))qA − 5qA

π A = (20 − .25qA − .25qB )qA − 5qA

dπ A = 20 − .25qA − .25qB − .25qA − 5 = 0 dqA

(To make a point, refer back to the graph of A’s residual market demand; it was shifted down by expectations of how much B was going to produce. In this equation, the ‐.25qB represents that shift.)

20 − .25qB − .5qA = 5

15 − .25qB = .5qA

qA = RA (qB ) = 30 − .5qB

(Another point: if A is a monopolist, B doesn’t produce anything, so qB=0 and qA would be 30, the same quantity that our monopolist above produced.) Since B’s cost is the same, this game is “symmetric”, so going through the same process will yield an identical reaction function:

qB = RB (qA ) = 30 − .5qA

270

Now let’s look at a graph of both players’ reaction functions. The RA is A’s reaction function, and the RB is B’s reaction function, while the horizontal axis is the quantity A produces, and the vertical axis is the quantity B produces. We said if B produces 0, A will want to produce 30; that is, why the horizontal intercept for A’s reaction function is 30. To find the vertical intercept for A’s reaction function, just solve the equation when qA = 0. By a similar process we can find the intercepts for B’s reaction function.

qB 60

30 qBNE

RB qANE 30

Now, let’s think about the (only) equilibrium of this graph. Suppose player A thinks player B will produce 30 units. RA tells us what player A’s best response is to that quantity; so, follow the dotted green line to see what player A would want to produce. However, if A produces this quantity, follow the line again to RB to see what player B would want to produce. It is clear that this process will continue until both players reach the point of intersection, the green dot. It is at this point where each player is making a guess about the other player’s quantity, maximizing their profit with respect to that guess, and both players are happy. Looking at the graph we can see that this is the only reasonable solution to this game, since it’s the only point where the two players’ best responses intersect. We know we want the point where both lines intersect, and we have both players’ reaction functions; thus, we can just solve the two functions as a system of equations:

qA = RA (qB ) = 30 − .5qB

qB = RB (qA ) = 30 − .5qA

qA = 30 − .5(30 − .5qA )

(substituting in qB’s reaction function)

qA = 15 + .25qA

.75qA = 15 ⇒ qA = 20

(Note: If the game is symmetric, where the only difference between the two firms is their names, and there’s only one nash equilibrium, as our game is, you can save time by concluding that qA=qB. Thus, after finding player A’s reaction function qA=30‐.5qB you could substitute qA for qB and gotten qA=30‐.5qA or 1.5qA=30 which gives us the same answer of 20 for each firm. This is handy if you need to save time on an exam.) Since the game is symmetric, and there’s only one equilibrium, we know qA=qB, so qB=20. We can now find industry quantity:

Q = qA + qB = 20 + 20 = 40

271

and referring back to our market demand curve to determine our price we get

P(Q) = 20 − .25Q = 20 − .25(40) = 10

so player A’s profit is

π A = 20(10 − 5) = 100

and since we know the game is symmetric, πB = 100 as well. Total industry profit is

πA + πB = 100 + 100 = 200

Notice that the total industry profit is 200 with competition, whereas the total industry profit in the case of a monopolist was 225. This should make intuitive sense; competition drives down profits. This is still higher than 0 profit, however, which was the case when we were considering price competition with two firms that had unlimited capacity.

Now, let’s generalize the graph qB of the reaction functions. Just as in our example, when player A produces nothing, player B will RA want to produce the monopoly quantity, and if player B produces MON nothing, player A will want to qB produce the monopoly quantity. If qBNE qA = monopoly quantity, we know RB MR<P since A would have to lower price to sell another unit; however, a monopolist that is maximizing qA qANE qAMON profit has set MR=MC, so this implies MC<P. Knowing this, player B will want to produce some positive quantity since he can sell for a higher price than his marginal cost. This implies B’s reaction function will be positive when qA=monopoly quantity, indicating he will want to produce some amount less than the monopoly quantity, but greater than 0. This concept is indeed why players engaging in Cournot competition face downward sloping reaction functions as shown above, and why the shape of their functions is the same regardless of particular numbers, linear vs. non‐linear demand, etc. Now that we have a framework for how oligopolies interact, we can see how managerial decisions will strategically impact an industry. One way to increase the value of your firm is to reduce cost. Let’s suppose player B invests in some technology that reduces his marginal cost. How would that impact the game?

272

We can look at the graph of the players’ reaction functions to find out. There are two results from the reduction in marginal cost. Since B lowered his marginal cost, for the initial quantity that A was producing (qA0), B will be able to produce more; therefore, his reaction function will shift out. At A’s original quantity of qA0, player B will now produce at q˜ B1 . This reduction in MC will increase B’s profits.

qB RA

q˜ B1

qB1 qB0

RB1 RB0 qA1 qA0

As a result of player B’s increase in quantity, player A will reduce her quantity. This is because her residual demand, which is based on how much B produces, will fall. This decrease in qA will also increase B’s profits. To better interpret these results, let’s first discuss some terminology. Within the context of Cournot competition, we say that quantities are “strategic substitutes”; that means that the more player A produces, the less player B will want to produce. It’s basically a fancy way of saying that reaction functions slope down. In a strategic sense, A’s quantity substitutes for B’s, and vice versa. Knowing this, we see that the effect of a cost reduction in Cournot competition is two‐fold. First, player B experiences an increase in profits from the direct effect of having a lower marginal cost. This simply means that his cost per unit is lower so he is able to produce a higher quantity. Secondly, player B earns higher profits due to the strategic effect of having A lower her quantity as a result of player B’s increase in quantity. Again, this goes back to the fact that their reaction functions are negatively sloped. We see that the strategic effect moves in the same direction as the direct effect, and both of them increase B’s profits. Thus, the advantage of a cost reduction is magnified in Cournot competition. Another way a firm could increase profits is through advertising. The problem with advertising is that it will benefit every firm; this is because in Cournot competition, all products are identical. For example, if an orange juice producer buys advertising for orange juice, the market demand for all orange juice will increase, and since all products are the same, every orange juice producer will benefit from the increase in demand. In fact, each firm will benefit relative to their market share of the industry. This is why in homogenous industries, firms will usually get together and form trade associations to advertise. This way, each firm is contributing to the cost of advertising, since each firm benefits. Suppose firm A increases advertising. We know this grows A’s demand, so for a given level of B’s output, A will produce more. However, since the products are

273

identical, B’s level of demand will grow by approximately the same amount. The graph to below illustrates this point. As you can see, both reaction functions shift out, since each firm qB wants to produce more given the R RA0 A1 A ↑ advertising increase in demand. The direct effect of A’s increase in advertising is that she experiences greater demand, and thus earns higher profits. However, the strategic effect is that B will want to produce more due to him also RB1 experiencing greater demand; this will cause A to earn lower profits. We RB0 see that in the case of advertising qA with quantity competition and homogenous products, the direct effect is diluted by the homogenous nature of the products and the direct effect and the strategic effect work against each other. In summary, for a firm in a homogenous product industry, capacity limits become very important. For two firms without capacity limits, prices will be driven down to cost, and neither firm will make a profit. Firms in a homogenous product industry shouldn’t heavily pursue individual advertising, since the spillover effects and the strategic effects decrease the benefits from advertising. Instead, firms should try to get into trade associations when advertising to split up costs among all the firms that will benefit from it. When a firm invests in technology to reduce marginal cost, the benefits are magnified since the direct effect and strategic effect move in the same direction. Thus, the most beneficial thing a firm can do in a homogenous product industry is to cut cost more efficiently than the competition.

First Mover (Stackelberg) Quantity Competition Up to now, we have assumed that player A picks her quantity given her “best guess” about player B’s quantity, and player B picks his quantity given his “best guess” about player A’s quantity. Let’s assume now that this is a sequential game, and that player A picks her quantity first. Remember from our discussion on game theory we said this means player A gets to basically dictate how the game will be played, because she knows how player B will respond to the quantity she picks. This does not necessarily mean A gains and B loses. Player B may be better off by the fact that he gets to maximize his profit based on a known quantity for A. Each game is different, and must be analyzed individually to see whether there is a first mover advantage or a second mover advantage. Since player A moves first, we know player B will set his quantity equal to

qB = RB (qA )

274

which intuitively says player B will maximize profit given player A’s quantity of qA. Since player A knows this is how player B will respond, she wants to incorporate this into her own profit function. Thus, her profit becomes

π A = P(qA + qB )qA − C(qA )

π A = P(qA + (RB (qA ))qA − C(qA )

where she has just substituted in B’s reaction function for qB. Maximizing profit:

dπ A dP ⎛ dRB ⎞ =P+ ⎜1+ ⎟qA − MCA dQ ⎝ dqA ⎠ dqA

which is the same answer we got when the players were playing simultaneously with the exception of the (dRB/dqA) term. What does this term signify? Intuitively, it’s how much player B changes his quantity based on player A producing 1 more unit. This is important to understand: we know the players’ reaction functions slope downward. This means as player A produces more, player B produces less. Now that player A moves first, she knows that after she sets her quantity, player B will set his based on what she chose. So, if she increases her quantity by one, it will lower what player B will produce by some amount; this amount is exactly what this new term represents. The reason this is important is because of the following. Say we are in a simultaneous move game. If player A increases quantity by one, market quantity goes up by one, and price will decrease by some amount. This is because player B picks quantity at the same time player A does, and has no time to respond to the increase in quantity. However, in this sequential move game, if player A increases quantity by one, B moves second and as such will produce some amount less (due to his negatively‐sloped reaction function). As a result, market quantity will still increase, but not by as much due to B’s decrease in quantity. Thus, market price will go down by less than it did earlier. Therefore, market price is less sensitive to player A’s quantity in a sequential game. Because of this, player A will sell more when she is the first mover than when the game is simultaneous.

275

Example

Let’s verify this using our specific example from earlier. Remember, constant marginal cost is $5, and P(Q) = 20 − (Q/4) . We also know that RB = 30 − (qA /2) . So

⎛ q (30 − (qA /2) ⎞ π A = ⎜20 − A − ⎟qA − 5qA ⎝ ⎠ 4 4

π A = ⎜20 − 7.5 −

⎛ q ⎞ π A = ⎜12.5 − A ⎟qA − 5qA ⎝ 8⎠

⎛ ⎝

qA qA ⎞ + ⎟qA − 5qA 4 8⎠

Maximizing:

Total market quantity is

qA = 5 = MCA 4

MRA = 12.5 −

q 7.5 = A ⇒ qA = 30 4

Q = qA + qB = 30 + 15 = 45

and price is

P (Q) = P (45) = 20 −

45 = 8.75 4

Solving for B’s quantity:

RB (30) = 30 −

30 = 15 2

We can see in the Cournot game each player produced 20; by moving first, A produces more (30) which causes B to produce less (15), but not one for one, so total industry output has increased from 40 to 45. Since total output has increased, price decreased from 10 to 8.75. Looking at their profits we see

π A = 30(8.75 − 5) = 112.50 , π B = 15(8.75 − 5) = 56.25

and total industry profit is π A + π B = 112.50 + 56.25 = 168.75 .

Notice in the example that industry profit is down from Cournot competition, but player A’s profit is up. This might be counter‐intuitive to what we think of as a “competitive market”; it seems that in Cournot competition, where each player has 50% of the market share, the market should be more competitive and thus prices should be closer to marginal cost. However, we’ve just shown that in Stackelberg competition, where one firm has a greater market share than the other, the industry has actually become more competitive, since prices have decreased and thus are closer to marginal cost. The reason is that player A increases quantity by more than B decreases quantity, so total output is higher and price is lower. The take‐away is that prices and quantities in an industry depend on more than just how many firms there are.

276

Longrun Equilibrium Earlier, we said in the long‐run, if firms are making a profit in an industry, other firms tend to enter, until each firm’s profits are 0 unless there are barriers to entry. Remember, barriers to entry could be legal, or simply economies of scale. Let’s now consider the long‐run implications of a homogenous product industry. Consider the most efficient firm that has not entered into the market. Assume the demand that is “left over” in the industry is dresidual, and it is the P LRAC demand curve that the firm faces. The firm has a long‐run average cost that is represented by the LRAC curve. Since there are quantities along the residual dresidual demand curve that have a higher d’residual price than the LRAC curve, the firm will enter, as it can make a q profit. We said that in P equilibrium, firms don’t make a profit; the only way for this to be the case is if the residual demand left for a firm is always below its LRAC curve. Thus, as firms enter, the residual demand will shift down to d’residual. When a firm faces this demand, we say that it is the marginal firm, in that it is just barely willing to enter. Of course, there may be no firm that is exactly indifferent to entering. The last firm that actually enters may make a profit, while if the next firm in line entered, it would make a loss.

Example

Let’s look at an example using the marginal firm. Assume, as before, that P = 20 − (Q/4) and the cost function for firm 4 is C4 (q4 ) = 5q + F where F is the fixed costs. Now suppose that there are n = 4 firms in the industry, and that q1 + q2 + q3 = 40 in the Nash equilibrium. Will entry occur? If there is profit left in the industry, more firms will enter; so, to find out whether this is the case, we need to find out firm 4’s profit.

⎛ 40 q4 ⎞ π 4 = ⎜20 − − ⎟q4 − 5q4 − F ⎝ 4 4⎠

⎛ q ⎞ π 4 = ⎜10 − 4 ⎟q4 − 5q4 − F ⎝ 4⎠

MR4 = 10 −

q4 = 10 and Q = 40 + 10 = 50

q4 = 5 = MC4 2

so price is

277

P (Q) = P (50) = 20 −

50 = 7.50 4

which means firm 4’s profit is

π 4 = (7.50 − 5)10 − F ⇒ 25 − F

So is F > 25, firm 4 will exit, and there are too many firms in the industry. If F < 25, firm 4 is making a profit, which means entry may occur. Entry won’t necessarily occur because there may only be room for four firms. To figure out if the fifth firm will enter, simply do the above calculation and see if he is making a profit. If F is exactly 25, firm 4 is the “marginal” firm and makes 0 profit exactly.

Now let’s generalize long‐run oligopoly theory. We know in the long‐run that in a given industry no excess profit will be available; if there were, more firms would have entered. Suppose it takes n firms to eliminate all profit potential for additional firms to make a profit. Firm (n+1) firm will not enter because they would make a loss if they did so. Firm n is the marginal firm if the residual demand left over for firm (n+1) is never be above its long‐run average cost curve. This is illustrated below. The demand that firm n+1 faces is the dresidual,n+1 p LRACn+1 line, and its average cost Total Output curve is LRACn+1 curve. Notice that the price that customers are willing to D pay (given by dresidual,n+1) is never high enough to cover the firm’s costs dresidual,n+1 (given by LRACn+1), which is consistent with long‐ run theory. If the D line q gives the demand for the entire market, the horizontal distance shown by the dotted line represents the total output of the first n firms. This total output is the Nash equilibrium quantity with n firms entering and all other firms not entering. Now let’s look at the general mathematics of our long‐run model. Let i be a firm in the industry, and let there be i=1, 2, …, n firms with i=1 being the most efficient, i=2 being the second‐most efficient, etc. and i=n being the least efficient firm that is still in the industry. The profit for a random firm j is

π j = p(∑ qi )q j − C j (q j ) . i =1

278

To clarify, ∑qi is simply the total industry output Q. Then, if firm j is in the market, we know

MRj = MCj (for all firms in the market)

since they are maximizing profit. We also know that for firm n, which is the least efficient firm that is in the market,

πn ≈ 0 since it is the marginal firm. Another way of saying this is that the marginal firm barely breaks even. How precisely this holds, however, also depends on the degree of economies of scale and the details of input markets and firm structure market. If economies of scale are significant, and say only two firms can enter before a third firm drives profits below 0, the two firms may make a profit greater than 0; if economies of scale are small, though, and there are 25 firms, firm n’s profit will be close to 0. Finally, since we know that firm n is making approximately 0 profit, this last equation can be stated as p ≈ LRACn . The two conditions that define long run equilibrium in an industry are then: 1) MRj = MCj (for all n firms in the market) and 2) p ≈ LRACn . Another important observation follows. We know MRj = MCj is the same as

dp Qs j = MC j dQ

and as s (the market share) goes to 0 (i.e. the number of firms in the industry increases), p approaches MC. Since p ≈ LRACn in long‐run equilibrium, that must mean MC approaches LRAC for the marginal firm as more and more firms fit in the industry. In other words, as economies of scale become smaller, the industry equilibrium becomes closer and closer to one in which firms are producing and pricing at minimum LRAC.

279

Chapter 15 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Oligopoly A type of firm structure with at least two firms, each having significant market share. Since there is a relatively small amount of big players, the managerial decisions that each firm makes have an impact on the demand that other firms experience. Monopoly A type of firm structure with a single firm that faces no competition and has complete autonomous control over setting prices and production levels. Perfect Competition A type of firm structure with many firms where each individual firm has no control over price levels; they are price takers. These firms have no strategic decisions to make, since they take their price from the market. Nothing they do will have any impact on the other firms in the market. Barriers to Entry Obstacles faced by potential entrants to a market. These could include legal barriers, such as regulations or patents, or economies of scale allowing the current firm being able to produce a given output more cheaply than two firms could. Homogenous Products Products that are identical, making price competition fierce. Homogenous Product Bertrand Competition A model that shows the fierceness of price competition. It represents two firms competing in price, producing homogenous products, and sharing the same marginal cost in which the price they charge will be driven down to the competitive level (at cost) and no profit will be made. This model assumes unlimited capacity. Durable Good A good that lasts a long time. A monopolist selling a durable good today would be in competition with himself tomorrow, and so on, until price is driven down to cost. Cournot Competition A model that represents a homogenous product oligopoly in which firms have capacity limits and move simultaneously. The firms in this model use reaction functions to make decisions on how much to produce. Strategic Substitutes Within Cournot competition, these are the quantities produced by the firms. The more one player produces, the less the other player will produce. This means that the firms’ reaction functions slope down because one firm’s reaction function is inversely related to the quantity produced by the other firm and vice versa. Direct Effect The benefits experienced by a firm that tries to get ahead of its competition by either reducing its marginal cost or increasing its advertising expenditure.

280

Strategic Effect The counteracting or magnifying effect caused by a competitor’s reaction to a firm’s decision to reduce marginal cost or increase advertising expenditures. Stackelberg Competition A sequential move model that represents two firms competing on quantity in which one firm is the leader and the other is the follower. Residual Demand Demand left over for the most efficient firm not yet in the industry. If this is above its long run average cost curve, the firm will enter. If it is below LRAC, it will stay out. Marginal Firm The firm that is almost efficient enough to enter the market but cannot because more efficient firms have already established their place. Thus, the residual demand curve for the market is below this firm’s LRAC curve.

281

Chapter 16 Differentiated Product Markets In the previous chapter, we assumed products were completely identical, and customers simply bought from the cheapest firm. In the real world, however, perfectly homogeneous goods rarely exist. Even if two products are identical, their location can itself be an aspect of differentiation; consider buying gasoline, and how you’d rather buy from a convenient location all other things equal. The fact that products are differentiated means, ultimately, that the effects of price competition are dampened; that is, undercutting an opponent won’t take away all of his demand, since there are certain customers that strictly prefer his product. Let’s look at a model for products that are identical except for their location; this is called spatial differentiation. Imagine a city with only one street, with one firm located at each end. There are N customers uniformly distributed along the street, and the length of the street is 1 unit (miles, blocks, kilometers, whatever). Let 0 be the location of firm A, and, similarly, let 1 be the location of firm B. Each customer is located at some point x between 0 and 1. This city is represented the figure below. $ v : willingness to pay t : transportation cost/unit of distance

v‐t(1‐x) A 0

x : location of the customer

v‐tx B x

Each consumer’s reservation value for the product is v. Consumers incur a (round trip) transportation cost of t per unit of distance traveled. If a consumer located at point x (somewhere between 0 and 1) buys from firm A at location 0, they have to travel a one‐way distance of x to make the purchase. Thus, the value they receive net of transportation cost is vtx. If they buy from firm B, they have to travel 1x units of distance, and, the net value they receive is vt(1x). The net values purchasing from each firm as a function of location (x) are also shown in the figure above. These net values (reservation price less transportation costs) represent the maximum amount a consumer located at x would be willing to pay for each firm’s product. Willingness to pay declines with distance, due to transportation costs.

282

Consumer surplus is the difference between their willingness to pay and what a consumer actually pays. For a consumer located at x, their surpluses if they buy from A or B respectively are:

SA = v – tx ‐ pA and

SB = v – t(1‐x) ‐ pB.

Each consumer will buy from whichever firm offers them the highest surplus. Assuming the firms prices are such that some consumers buy from each, a consumer at some location, x , will be indifferent between the firms. To find this location, x , just equate the surpluses and solve. S A = SB tx + p A = t − tx + pB 2tx = t − p A + pB x=

t + pB − p A 2t

This location, x , defines the point of indifference. Everyone closer to A will buy from A and everyone closer to B will buy from B. Since consumers are uniformly distributed, and since x represents the fraction of the distance between 0 and 1 where x lies, x represents the fraction of the N consumers that buy from A and (1 − x) represents the fraction of the N consumers that buy from B. Thus, demands are:

⎛ 1 p − pA ⎞ q A = Nx = N ⎜ + B ⎟ 2t ⎠ ⎝2

(16.1)

and

⎛ 1 p − pB qB = N (1 − x) = N ⎜ + A 2t ⎝2

⎞ ⎟ . ⎠

(16.2)

Looking at the demand functions, we can make some conceptual connections. First, notice if pA=pB, qA and qB are N(1/2). This should make sense – if the only difference between firm A’s product and firm B’s product is the location, and they charge the same price, each firm will get the half of the market that is closest to them. Earlier we defined t to be the “transportation costs per unit of distance,” and we said that this is the only thing that differentiates the products. Now, however, we can think of geographic distance as a metaphor for any characteristic that differentiates the products, and we can think of t as a metaphor for how important that product differentiation is. This means the higher value that t has, the more differentiation is important, since a customer will lose greater surplus the “further” away they are from their ideal product. Looking back at our formula we see that this value of t influences the impact of the difference in prices (pB – pA). If t is very small, the effect on quantity of the

283

difference in prices will be magnified (since you will be dividing by a small number). This should make sense, since the smaller t is, the more homogenous the products are (the less the difference matters to consumers), and thus the effects of price competition are drastic, as they were when we first talked about price competition with homogenous products. If t is very large, the effect on quantity of the difference in prices will be small. A larger t symbolizes more emphasis on differentiation, and like we just showed, undercutting will take less of the market away from your competitor if non‐price differentiation is more important. Now let’s model this as a simultaneous play game where firms compete on prices and solve for the Nash equilibrium. This is known as differentiated product Bertrand competition. Assume both firms have an identical constant cost per unit of c. Then:

⎛1 ⎝2

π A = ( pA − c ) N ⎜ +

pB − p A ⎞ ⎟ . 2t ⎠

(16.3)

Maximizing:

dπ A ⎡ 1 p − pA −1 ⎤ = N⎢ + B + (pA − c )⎥ = 0 (by the product rule) 2t dpA 2t ⎣2 ⎦

p p c⎤ dπ A ⎡1 p = N ⎢ + B − A − A + ⎥ = 0 dpA ⎣ 2 2t 2t 2t 2t ⎦

(simplifying)

t + c + pB − 2 p A = 0

pA = RA ( pB ) =

c + t pB + 2 2

(multiplying both sides by

(solving for pA)

2t ) N

So maximizing firm A’s profit with respect to price (since now price is what the firms are choosing, not quantity) we get a reaction function for A’s best price given a “best guess” about what price firm B will charge. Just as in Cournot competition, we can graph this reaction function, but instead of modeling responses to estimates of quantity, we will be modeling responses to estimates of price.

(c+t)/2

284

The stark difference between reaction functions in differentiated product price competition compared to quantity competition is that they slope up. This is because as player A raises her price, player B will want to raise his as well; remember in Cournot (quantity) competition, as player A raised her quantity, player B wanted to lower his quantity. The Nash equilibrium is simply where the reaction functions cross, just as it was in Cournot competition. We have player A’s reaction function, and since this game is symmetric, we know B’s reaction function would mirror A’s. But, we can just use symmetry to solve, since in equilibrium pA=pB. Thus: c + t pA + 2 2 pA c + t = . 2 2 p ANE = pBNE = c + t pA =

(16.4)

Note that the players’ prices are the same only because the game is symmetric. Below, the solution is added to the figure.

c+t (c+t)/2 (c+t)/2

c+t

We can now see how as t → 0 , p NE → c . Since we said t can be seen as a metaphor for any characteristic that differentiates player A’s product from player B’s, this should make sense. When t is 0, this (metaphorically) means there is no difference between the two products; thus, price will approach the constant per unit cost, just as it did when we were examining price competition in homogenous product industries. A conclusion is that price becomes higher than cost if (a) we have capacity limits, like we saw in Cournot competition, or (b) we have differentiation, like we’ve seen in Bertrand competition. To generalize this concept of differentiation, let’s look at what’s true in general. Assume that MCA=cA and MCB=cB. We know that how much A sells depends on how much A charges, but also on how much B charges. Thus, qA(pA,pB), and

π A = ( p A − cA ) q A ( p A , pB ) .

285

Maximizing:

πA dpA

= qA +

dqA (pA − cA ) = 0 . dpA

(using the product rule)

Notice when A maximizes her profit, B’s price (potentially) shows up in two places; since we know qA is a function of both pA and pB, both A’s and B’s prices influence the first qA term, as well as the (dqA/dpA) term. Thus, when you solve this equation for A’s price, you will get a reaction function that depends on B’s price, or RA(pB). This is true for B’s price as well, in the sense that it will be a function of A’s price, or pB RA RB(pA). The graph of their reaction functions will be upward sloping, as they were in the RB last example, since whenever competing by pB price, firm A will want to respond to an increase in B’s price by increasing her own price, at least as long as prices are below the monopoly level (which they will be, due to pA competition).

Example Assume qA = 10 − pA + .5 pB , qB = 18 − 2 pB + .5 pA and that cA = 2 and cB = 1. Solving for A’s reaction function

π A = (10 − pA + .5 pB )(pA − 2 )

dπ A = 10 − pA + .5 pB − pA + 2 dpA

2 pA = 12 + .5 pB

pA = 6 + .25 pB

(product rule)

and solving for B’s reaction function

π B = (18 − 2 pB + .5 pA )(pB − 1)

dπ B = 18 − 2 pB + .5 pA − 2 pB + 2 dpB

4 pB = 20 + .5 pA

pB = 5 + .125 pA

(product rule)

To solve for the Nash equilibrium, find where the reaction functions intersect:

pA = 6 + .25 pB and pB = 5 + .125 pA

pA = 6 + .25 (5 + .125 pA )

286

pA = 7.25 + .03125 pA

pA = 7.48

pB = 5 + .125(7.48) = 5.94

and these are the players’ respective Nash equilibrium prices.

Price (Bertrand) Competition with a First Mover First, it’s important to understand that the first mover in this game will do at least as good for himself as in the Nash equilibrium. The reason for this is because the simultaneous game had a pure strategy equilibrium where each player played their best response to the other. If the first mover sets the same price when they go first that they would have set moving at the same time, the second mover will respond in the same way. So, equilibrium prices and payoffs would be the same. The first mover has additional control over how the game will be played, since he is setting up the stage for the second mover. Knowing how the second mover will respond, the first mover would never set a price that, after the second mover moves, would give him less of a profit than he could achieve by choosing the same price as chosen in the simultaneous move game. Relative to the original Nash equilibrium price, what will the first mover do? Remember, in the simultaneous move Nash equilibrium, both players are on their reaction functions. That means, the net marginal benefit, NMB, of a change in price is 0 for both players. (Setting NMB equal to 0 defines the reaction function with continuous strategies.) So, a slight increase or decrease in the first mover’s price has no significant impact on their profit. However, an increase in the second mover’s price would boost the first mover’s profit. Since the first mover knows the reaction functions slope up, they can get the second mover to charge a higher price by increasing their own price. This boosts the first mover’s profits. Since the first mover chooses to increase price, when they did not have too, their profits will be higher, due to the induced increase in the second mover’s price, as compared to the simultaneous move game. The second mover, however, benefits from the increase in the first mover’s price AND gets to choose their price optimally in response. This gives them an opportunity to raise their price somewhat but to also undercut the first mover relative to the simultaneous move game. Thus, the second mover profits relatively more than the first mover. Therefore, when there is a first mover, both firms obtain higher profits, but both firms would rather be the second mover than the first. In general, if player B is the first mover, his profit will look like the following

π B = ( pB − cB ) qB ( pB , RA ( pB ) ) .

We substituted RA ( pB ) for firm A’s price because, since B moves first, B knows A will respond according to their reaction function, thus, when choosing their price, the first mover knows that the second mover’s price is dictated by the first mover’s choice. Maximizing B’s profit:

287

⎡ dq dq dp ⎤ dπ B = qB + (pB − cB )⎢ B + B A ⎥ . dpB ⎣ dpB dpA dpB ⎦

Compared to the simultaneous game, the second term in brackets, (dqB/dpA)(dpA/dpB), is new. Remember, B is moving first. The new term says that player B’s price influences player A’s price through the reaction function (dpA/dpB), and that player A’s price influences player B’s quantity (dqB/dpA). Intuitively, this term is saying that as player B moves first, he knows what price he charges will affect what price A charges, which in turn will affect what quantity B sells.

Example From our earlier example qB = 18 − 2 pB + 0.5 p A and cB = qB . We also found player A’s reaction function to be RA = 6 + 0.25 pB . Find the equilibrium prices assuming B moves first.

π B = ⎡⎣18 − 2 pB + .5 (6 + pB / 4 )⎤⎦ (pB − 1)

dπ B p ⎛ 1⎞ = 18 − 2 pB + 3 + B + ⎜ −2 + ⎟ ( pB − 1) = 0 dpB 8 ⎝ 8⎠

7 7 7 21 − 1 pB − 1 pB + 1 = 0 8 8 8

pB = 6.10 > 5.94

(using the product rule)

Player B indeed raises his price from 5.94 to 6.10 when he is the first mover. To find player A’s price, use her reaction function:

RA = 6 + 6.10 / 4 = 7.53 > 7.48 .

Player A raises her price as well. But, it took a relatively large increase in B’s price to induce a relatively small change in A’s price. If we were to plug these prices back into the players’ profit functions, we would see that both players’ profits increased, but, A’s increased relatively more. The reader should verify that for themselves.

Incentives for Cost Reduction Let’s see how a cost reduction by one player affects price competition in a differentiated product market. Recall that players competing in Bertrand (differentiated) markets face upward sloping reaction functions. Let’s see what happens if player A invests in some technology to reduce her marginal cost. The changes are depicted in the two figures below.

288

pA pA0ne

MCA RA’

MCA’

pA’ pA1ne MRA MRA’

DA DA’

pB0ne pB1ne

pA1ne

pA0ne

pA’ Initially, from the left panel of the figure we see player A wants to charge a lower price (from pA0ne to pA’) since her MR equals her MC at a higher quantity. Since player A now wants to charge a lower price regardless of what price B charges, her reaction function RA will shift left (remember, A’s price is on the horizontal axis, so for her price to decrease, the reaction function must shift left). If player B were not to respond to this new reaction function, and were to continue charging pB0ne, player A would be content charging pA’. However, since player B is rational, he will react to A’s lower price by charging a lower price himself, moving along his reaction function. Because of this strategic effect, player A will in turn lower her price even more. The new Nash equilibrium prices are pA1ne and pB1ne. There are a few things going on here. The direct effect of player A’s cost reduction is a decrease in A’s price and an increase in A’s profits. Since player A lowered her price, player B loses some market share; as a result, he will lower his price to get some back (just as his reaction function tells us). This strategic effect will take back some of B’s lost market share from A, decreasing A’s profits. The fact that the strategic effect moves in the opposite direction of the direct effect in price competition dampens the benefits from a cost reduction. Recall that these effects moved together when firms were competing in quantity (Cournot competition). Thus, in Cournot competition, firms wanted to move first to get a bigger share of the market and had greater incentives to invest in cost reductions, as the strategic effect enhanced the direct effect. In Bertrand competition, firms want to move second to raise price while undercutting their competitor, and they have less incentive to engage in cost reduction as the strategic effect lessens the gains from the direct effect. Now let’s consider advertising.

289

Advertising When firms were competing in a homogenous industry, advertising only made sense if all of the firms were involved in the cost. This is due to the spillover effects of advertising for identical products. In differentiated markets, however, firms have a greater incentive to advertise since spillover effects are lessened. We can classify two different types of advertising as follows. First is informative advertising. Imagine the following is the distribution of potential customers for a certain type of product. How far along the line you are represents your preference for a certain characteristic (location to product, sweetness of cola, etc…).

A B Imagine firm A and B are located as above. We know customers who are “closer” to either firm will buy from that firm assuming equal prices; thus, customers on the right side of the dotted line will buy from firm A, and those to the left will buy from firm B if prices are equivalent. However, this is assuming that all the customers are perfectly informed about where firm A is and where firm B is. If customers weren’t aware of firm B’s location, it’s possible those to the left of the dotted line would buy from A, although they’d prefer to buy from B if they knew it existed. Thus, firm B may invest in informative advertising to simply inform the potential customers that he exists. This kind of advertising is good for society since it simply informs customers. It is also good for managers because it creates awareness about their product. Next is persuasive advertising. Assume we have the same picture as above. If firm A were to invest in persuasive advertising, they would be attempting to shift consumer preferences to be more in favor of A’s product as opposed to B’s product. Graphically, this would look like a shift in the curve of potential consumers, shown below.

A B Now, the group of customers that are closer to A (still to the right of the dotted line) is much larger than those closer to group B. This advertising may or may not be beneficial to society. If both firms invest in persuasive advertising and end up

290

canceling out each other’s effects, the money has just been wasted. From a manager’s perspective, though, it becomes a question of maximizing profit. If we were to add advertising to the firm’s profit function, it would become

π = pq ( p, A) − C ( q( p, A) ) − A

which means that the quantity that the firm sells now depends both on the price he charges, as well as his amount of advertising. So, since there are two variables under the firm’s control, maximizing profit requires two partial derivatives:

∂π = 0 and ∂p

∂π ∂q dC ∂q = p* − − 1 = 0 . ∂A ∂A dq ∂A

This will give us two equations with two unknowns (price and advertising), so we can solve for each variable. Looking at the second derivative, we know the derivative of cost with respect to quantity (dC/dq) is marginal cost, so rewriting the equation we get

( p − MC )

∂q = 1 . ∂A

This says the marginal benefit of advertising should equal its marginal cost. The marginal benefit of advertising is marginal profit per unit sold (p‐MC) times how many more units you sell for one more unit of advertising (∂q/∂A). The cost of spending another dollar on advertising is 1. A numerical example will clarify: suppose price is 4 and MC is 1, so marginal profit per unit is 3. If one unit of advertising increases sales by 0.5 units, then

( 4 − 1) ⋅ 0.5 = 3 ⋅ 0.5 = 1.5 > 1 .

Since the revenue generated from one more unit of advertising (1.5) is greater than the cost of one more unit of advertising (1) you should spend more on advertising. In other words, the marginal benefit of advertising should equal the marginal cost of advertising, or MBA = MC A . If we look at a graph with both firms’ reaction functions on it, we can examine the strategic implications of an increase in advertising on the choice of price. This is shown in the figure below. Say firm B increases their advertising budget; this will allow him to charge higher prices regardless of the price charged by A, shifting his reaction function up.

291

If B’s advertising has no direct effect on A’s demand, instead only effecting it indirectly through changes in prices, in equilibrium prices will change from p0 to p1. However, B’s additional advertising may take a significant number of customers from A, lowering A’s demand curve at any given price. Firm A will respond by lowering price (to maintain some customers) ‐ A’s reaction function shifts to the left. In this case, the new prices are pA1’ and pB1’.

RA’

RA RB’ RB

pB1 pB1’ pB0

pA1’ pA0 pA1

We won’t be running through a concrete example in which both price and advertising are chosen strategically, primarily because the algebra becomes very tedious when there are 4 equations to solve for 4 unknowns. But, but it is important to remember that when choosing advertising a firm needs to be aware not only of the ideal price and advertising level it should choose holding its opponents actions constant, but, also of how the firm’s advertising affects demand for its competitors, and, that changes in advertising will impact equilibrium prices and vice versa.

Long Run Equilibrium Just as in homogenous competition, for the long‐run to be in equilibrium, the marginal firm must not make any profit, or more firms would enter. In other words, it must be the case that the residual demand that the marginal firm faces never reaches his long‐run average cost curve. The residual demand that a p firm in a differentiated industry may at first not seem analogous to a LRACn+1 homogenous industry; however, it can be viewed in a similar way, since differentiated products are still substitutes for one another, albeit not perfect substitutes. The graph to the dresidual right illustrates the residual demand that the faced by the (n+1)th firm if the q market can sustain n firms in equilibrium. Since the marginal firm doesn’t make any profit in the long‐run, it may be tempting to say that in the long‐run, firms that are in the market don’t have to worry about strategy – market forces mean profits are driven to zero. This is not the case. The reason firms that are in the market must continue to think about strategy and good management is they still owe their shareholders a fair rate of return on their investment, whatever that is. When we say the marginal firm makes zero profit in

292

the long‐run, the profit we’re talking about is economic profit; that is, profit in addition to normal returns. Also, if an industry is limited by economies of scale so that it supports only relatively few firms, strategy is important even if the marginal firm makes zero profit. Basically, if there are large economies of scale to entry, and there is only room for three or four firms, strategic decisions can help firm two take profits away from firm three, for instance. On the other hand, if economics of scale are small, there will be many small firms, and strategy is not as important. The name for this type of industry is “monopolistic competition.” However, since the products are differentiated, strategy is still somewhat important “locally” (as opposed to “perfect competition”). Imagine two fast‐food restaurants that are across the street from one another. Even though their individual strategic decisions won’t impact the entire fast‐food industry that much, they are highly concerned with the decisions that the other is making as they impact directly their potential customers. So, with differentiated products, even if firms are small, a firm’s choices have non‐negligible impacts on its “closest” competitors.

Summary Strategies and outcomes vary greatly between homogenous products and differentiated products and between quantity competition and price competition. With identical products, since firms will not build lots of excess capacity unless there is a specific reason, competition tends to be about quantity and capacity, not about price directly. That means cost management is important, advertising less so, and, there is a first mover advantage. As soon as there is some aspect of differentiation between the products, it is possible for competition to be about price, not just capacity. With differentiated products and price competition, reaction functions slope up. Cost management, while important, is less important than with quantity competition. Advertising now becomes an important strategic consideration. And, it is better to move second than first, though moving first still beats moving simultaneously – firms had an incentive to be the second mover to exploit the first mover’s posted price. It is possible to have quantity competition with differentiated products. This kind of model is especially important in cases where lead times in production are significant and when the degree of substitutability is high, though not perfect. Then strategic decisions are more about capacity than price. But, the imperfect substitutability means there is room for advertising and that the strategic effects of cost management alone are not as large as they would be for identical products. Ultimately, it’s a question of how differentiated a product really is. The more homogenous a product, the greater the incentive to move first and capture a larger market share, the more important cost management, the more likely capacity has major strategic implications, and the less important individual advertising. The more differentiated a product, the less important capacity and cost management become, and, the more important individual advertising becomes.

293

Chapter 16 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Differentiated Products Products that are not identical, making price competition less fierce because factors other than price are important. Transportation Cost A metaphor for any characteristic that differentiates products. A higher transportation cost means an increased importance in product differentiation since the customer will lose greater surplus the farther away they are from their ideal product. Consumer Surplus The benefit received by consumers who can buy a product for less than their willingness to pay for it. Differentiated Product Bertrand Competition A model that shows how fierce price competition can be dampened by the presence of other factors. Reaction functions slope up because the price charged by one firm is directly related to the price charged by the other firm. Informative Advertising A new firm’s advertising meant to let customers know of its product. This type of advertising is good for society since it simply informs customers. Persuasive Advertising A firm’s advertising meant to change the consumers’ preferences to be more in favor of its product as opposed to a competitor’s product. If both firms engage in this type of advertising, their effects cancel each other out, wasting money. Product Positioning Occurs when a firm introduces a new product, similar to its other products, taking advantage of a vulnerable niche in the market. Even though the firm is losing profit, taking demand away from its existing products (cannibalization), it may be beneficial in preempting another firm’s entry, increasing profit in the long run.

294

Chapter 17 Perfect Competition In the long‐run in an industry producing a reasonably homogenous product, the strategic effects of the decisions of individual firms become trivial if economies of scale are so small that “many” firms must enter the industry before profits are eliminated. Think of the global market for wheat; a single Kansas wheat farmer’s strategies have insignificant effects on what a European farmer decides to do. We can see this directly from an expression for marginal revenue derived in chapter 12. Letting s represent an individual firm’s market share, q/Q, the marginal revenue for firm i, a firm in a homogenous market, is:

MRi = p +

dp dp qi = p + Qsi . dQ dQ

This says that the revenue a firm generates for selling one more unit is the price (p) that it charges, as well as the effect on price that the sale itself has (dp/dQ) times how many units the firm is selling. Every time a firm sells another unit, it drives down market price (because of the law of demand), which means everyone selling output in the market is going to get less revenue. So, the firm selling the extra good drives down price a little bit for everyone (including himself), but collects the price from the new good he just sold. The effect of the price reduction on all of the other products he would have sold is only relevant to the part of the market the firm itself owns, si. In other words, he’s only “losing” profits from the price reduction on his market share (si). As the number of firms increases, each firm’s market share decreases, and as si approaches 0, marginal revenue approaches price (since the second term in the above equation falls out). If a firm is small enough, they can just ignore the effect of their sales on market price when choosing quantity. So, for them, marginal revenue and price are the same thing. So, firms that are “small enough” can jut be modeled as price takers. For them, setting MR=MC means setting p=MC. This turns out also to be the case with differentiated product markets when economies of scale and market size allow many firms to enter. This is because even though there may be several firms selling slightly different products, they are still substitutes for one another. As the product space becomes more and more tightly packed with the products of competing firms, every individual consumer finds that they have increasingly appealing substitutes for their preferred product. That means the demand for any individual firm’s product becomes increasingly elastic as more and more firms introduce products. Mathematically, we cannot combine all the individual firm’s quantities, q, into a total market quantity, Q, since the products are not the same thing. So, we cannot work from the expression for marginal revenue given above. We can, however, work

295

from an alternative expression. Recall the following expression for marginal revenue. MRi = pi +

⎛ dpi dp q 1⎞ qi = pi + i i pi = pi ⎜1 + ⎟ dqi dqi pi ⎝ ηi ⎠

As the number of substitutes increases, the absolute value of customers’ elasticity (η) increases, the ratio (1/η) approaches 0, so marginal revenue approaches price. With differentiated products, as the number of good substitutes becomes large, marginal revenue again becomes essentially equal to price and MR=MC becomes p=MC. The model of perfect competition, then, just begins from the assumption that firms are pricetakers so that it can analyze market outcomes while ignoring the intricate complexities that arise from strategic interdependence. It is applicable when differentiation is “small enough” and economies of scale are “small enough” that the number of firms in the industry is “large enough” that the implications of strategic interdependence become of only secondary importance when looking at outcomes for a market as a whole. For any specific firms in the market, facing any particular decision, strategic implications may still be of importance. But, the strategic interplay has little impact on the performance of the market as a whole and is thus ignored so that we are not prevented from seeing the forest by the complexity of all the trees. The model for perfect competition requires a lot of assumptions (rational players, complete information, price takers, etc.) and thus is not a close representation of “reality.” As such, its predictions will not exactly hold for any particular firm in any particular industry. The goal of any model we’ve used is to highlight the implications of the most important decisions in any situation and to predict the most important general effects of changes in outside factors on the market and players being modeled. If there are “enough” firms and if they are “small enough” so that taking them to be price takers will not cause us to ignore strategic interactions that have large effects on the whole market, assuming all firms are price takers becomes a useful if unrealistic simplification that allows us to focus on the big picture.

Perfect Competition in the Shortrun Looking at the short‐run, firm i’s profit is

πi = pqi − STCi (qi ) . Maximizing gives:

d πi = p − SMCi = 0 dqi . ⇒ p = SMCi Looking at this graphically gives the figure below.

296

Firms produce where MR=MC to maximize profits. Since p=MR here, that means producing where p=MC. So, for a given price (two are shown), a firm simply looks to its marginal cost curve to determine how many units to sell. p MCi Thus, the firm’s MC curve that lies firm i's above its AVC curve can be thought p2 supply curve of as its supply curve. If price were less than the minimum possible p1 AVCi AVC, the firm could not even cover all of its variable costs. Thus, it would do better to shut down and simply lose its fixed costs rather q1 q2 than to produce and lose all fixed qi costs plus some variable costs. If you were to solve the equation p – MCi(qi) = 0, for qi, you would obtain firm i’s supply curve, or qis(p). The firm’s supply curve gives the quantity the firm produces to maximize profit given the market price. Everything the individual firm needs to know about demand and all the other firms is summed up in the market price. The overall supply curve of the market is then the sum of all of the individual firm’s n

supply curves, or S ( p) = ∑ qi ( p) . Graphically, market supply is the horizontal sum i =1

of the individual firm’s supply curves. With the market supply, we can readily look at firm behavior and market equilibrium, the interconnections between the two, and, firm and market responses to changes in parameters such as taxes, regulations, wages, interest rates, incomes, or anything else that affects supply or demand. p

Firm MCi’ MCi

Market

S’ S(p)=∑qi(p)

AVCi’ AVCi

pe’ pe D

qi(pe)

Qe’ Qe

In the figure above, market equilibrium price, pe, balances supply and demand, otherwise there would be either a shortage or a surplus, driving price up or down, respectively. At that price, each firm produces according to their marginal cost curve, or, their individual supply curve, at qi(pe). Now suppose a regulation of tax drives up variable costs for individual firms. The firm’s cost curves will be higher (shift up) to MCi’ and AVCi’. This shifts market supply left, decreasing equilibrium quantity to Qe’ and raising equilibrium price to pe’.

297

By making our earlier assumptions that led to marginal revenue approaching price, we eliminated away the strategic interactions between firms. Note again that this doesn’t mean they don’t exist; it’s just that making these assumptions allows us to add in a supply curve and talk about markets as a whole. That is, it allows us to simultaneously analyze a representative firm and the market as a whole in the two simple figures above.

Example

There are 25 identical firms, each firm with short run total cost of STC = 10 + 2q + q 2 / 4 . Market demand is D( p ) = 200 − 10 p . To find the marginal cost of each firm, simply take the derivative.

SMC = 2 + q / 2

To find supply, set p = SMC and solve for q. MC = 2 + q /2 = p

q /2 = p − 2 q = 2p − 4

Since the market supply curve is simply the sum of the individual firms’ supply curves, with identical firms, just multiply an individual firm’s supply by n. S ( p ) = 25qi

= 25(2 p − 4) = 50 p − 100

To find equilibrium price and quantity, set supply equal to demand.

S ( p) = D( p ) 50 p − 100 = 200 − 10 p 60 p = 300 p=5

298

Putting this in a graph gives the figure below.

Market

Firm

AVC pe=5

pe=5

MinAVC

D q(pe)=6

Qe=150

Equilibrium firm‐level and market‐level quantities are shown, along with equilibrium price. We know that firms will only produce where price is greater than AVC, or else they’re not even covering costs. So, at the firm’s minimum average variable cost, the market supply curve cuts off. Market price will never fall below the lower dotted line in the short run because of this reason.

Perfect Competition in the Longrun If firms are making economic profits (or π>0), there will be entry. Thus, for an industry to be in equilibrium, it must be the case that no firm can enter and receive a price higher than their minimum average cost. Even though firms aren’t making an economic profit, they are still all maximizing their individual profits. Thus, every firm in the industry sets marginal revenue equal to marginal cost, and since price equals marginal revenue,

pi = LRMCi . where LRMCi is the long‐run marginal cost of firm i. For the marginal firm, firm n, we assume that pn ≈ LRAC n . The reason it is approximately equal to and not exactly equal to LRAC is because economies of scale may be such that firm n, which is in the industry, makes a profit, but, firm n‐1, which is not in the industry, would make a loss if it entered. When we work problems, however, we assume that pn = LRACn for simplicity. Since the marginal firm produces where the (given) market price equals long‐ run average cost ( pn ≈ LRAC n ) and for every firm in the industry price equals long‐ run marginal cost ( pi = LRMCi ) it follows that LRMCn=LRACn. We know that marginal cost equals average cost at the minimum of the average cost curve; so, in the long‐run, the marginal firm will produce where price equals minimum average cost (p=MinLRAC) which is also where marginal cost equals average cost. If price

299

were higher than MinLRAC, the marginal firm would be making a profit, and firms would enter; if it were lower, the marginal firm would be making a loss, and firms would exit. At this point, it is necessary to distinguish between two possible types of industries: constant cost industries and increasing cost industries. In an increasing cost industry, as the industry expands, it drives up the prices of the inputs it uses, or it has to put into use inputs that are less productive. If the industry must put to use inputs that are less productive, that has the effect of driving up the price of the more productive inputs above their “opportunity” cost. Such increases in the prices fetched by the more productive inputs are known as “rents.” To take the simplest and, perhaps, most compelling example, consider the market for agricultural grains. If little grain is to be produced, it is produced on only land, which is both well suited to grain production (fertile) and not well suited to other uses. Since the land is not well suited to other uses, it will have a relatively low price. However, as the market expands, more land must be put into production. Competition for the more productive land will drive up its price. So, it will become profitable to use land that is slightly less productive, or, to divert productive land that had some other productive use (say in ranching) to grain production. The land that was most fertile but had few other uses now fetches a higher price. That is an economic “rent” – a payment in excess of the reservation value at which the landowner would have been willing to lease the land for grain production. Economic rents can accrue to other types of inputs as well. Michael Jordan, Peyton Manning, Angelina Jolie, and Johnny Cash would all have been happy to perform for far less than they actually earned at the height of their careers. If the demand for professional women’s soccer were higher, Mia Hamm and Abby Wambach would be much wealthier than they are. On the other hand, if the demand for professional football were lower, Peyton Manning would not be as wealthy as he is and many journeymen NFL quarterbacks would be in other lines of work. At the most basic level, individual workers, land, materials, etc… are idiosyncratic and thus better suited to some uses than to others. If an industry is large enough, and, thus uses a large share of a certain kind of input, as the industry expands or contracts, it drives up or down the going rates for the best suited inputs, and brings less well suited inputs into the industry or expels them from it. So, if an industry consumes a large share of some of the things it uses as inputs, as it grows, the minimum long run average cost of running a firm in the industry goes up. The construction industry drives up the wages of unskilled manual laborers as it expands, and, drives them down as it contracts. The agricultural industry drives up the price of fertile land as it expands, and, drives it down as it contracts. Sports leagues drive up the salaries of the most skilled players as they expand, and, the movie industry drives up the payments to the most skilled actors and actresses as it expands. These would all be increasing cost industries, though, depending on institutions and the question at hand, they may or may not be reasonably approximated as perfectly competitive.

300

For some cases, though, the industry is a small player in the markets for the factors of production it uses, (for example the furniture industry), or, the effect of the industry on the prices of the inputs it uses is not of much direct interest to the question under study (for example, when studying policies to encourage energy efficiency in residential construction). In such cases, it more accurate, or at least simpler and accurate enough, to just assume the industry is constant cost industry. We also assume that all firms, whether in the industry or not, have an identical cost structure. In a constant cost industry, the industry can expand and contract without affecting the cost or productivity of the factors of production used in the industry. Therefore, as the industry expands or contracts, the cost curves of its representative firms remain constant. In any event, when the industry is in long run equilibrium, price equals the marginal firm’s minimum long run average cost, as shown in the figure below. The difference is whether the long run industry supply curve slopes up or is flat (perfectly elastic). We have drawn two potential long‐run industry supply curves (LRIS) in the figure. The flat LRIS curve represents a constant cost industry. The other possibility is that average costs increase as the industry expands, in which case the LRIS sloped up, representing an increasing cost industry.

Firm

LRMCn

Market

(increasing cost industry)

LRIS

LRACn peLR

LRIS

peLR0

(constant cost industry)

D0 q

QeLR0

Relationship Between the Longrun and the Short Run In the next figure, we deal with a constant cost industry and add a short run supply curve to the initial picture of the market equilibrium and also add the firm’s short run marginal cost curve. Suppose market demand shifts out, for whatever reason. New demand intersects the short‐run supply curve at a price of ptemp, and we see that our marginal firm is making a profit (since this price is above average cost). Thus, firms will enter, shifting the short‐run supply curve to SSR1. Since all of the firms have the same cost structure, and the addition of new firms doesn’t affect cost (constant‐cost industry), price will tend towards the original equilibrium price, which is why the long‐run industry supply is flat.

301

Firm

LRMC

Market SSR0

SRMC

SSR1

LRAC

p temp e p LR

LRIS

peLR

The other possibility is that if demand increases, average costs increase as firms are added to an industry, either because the firms being added are less efficient than the original firms, or just because of competition for the factors of production best suited to the industry. This is the case in the figure below. In the left panel of the figure, average cost curve moving from LRAC0 to LRAC1 as entry occurs. Thus the long‐run industry supply curve (LRIS) slopes up and the long‐run equilibrium price increases from peLR0 to peLR1. Firm LRMC1 LRMC0 $ LRAC1 peLR1

LRAC0

Market SSR1 SSR0

LRIS

peLR1 peLR0

peLR0

D0 q

QeLR0 QeLR1 Q

302

Example

The long run cost function for firms in a constant cost industry is C (q) = 2q − 0.2q 2 + 0.01q 3 . Market demand is QD = 450 − 100 p . Find equilibrium price, supply, equilibrium quantity, and, the number of firms in the industry in the long run. We know firms produce at their minimum average cost; in other words, where marginal cost equals average cost. So, set the two equal, and remember average cost is simply cost divided by q.

AC = 2 − .2q + .01q 2 = 2 − .4 q + .03q 2 = MC .2q = .02q 2 .2 = .02q q = 10

How now to find price? It may help to look at a graph (right). We found firm quantity to be 10 when average cost is at its minimum. We see we can plug that into either the LRMC curve or the LRAC curve to find price.

LRMC LRAC

LRAC (10) = 2 − 0.2q + 0.01q 2

= 2 − 0.2(10) + 0.01(10) 2

p e LR = 2 − 2 + 1 = 1 Since this industry is constant cost, our long‐run industry supply curve will be at a price of $1. To find industry demand, plug in the long‐ run price of $1 into the demand curve. QD (1) = 450 −100 = 350

$1 D

Since we know each firm produces 10 units, there are 350/10 = 35 firms in the industry in the long‐run.

350

LRIS

303

Value Added and Social Welfare in (more or less perfectly) Competitive Markets What we want to do now is measure welfare in competitive markets, that is, how much value is added because the product market exists. Part of that value is captured by consumers in the form of consumer surplus (CS). Consumer surplus was defined earlier in Chapter 9. It is simply the difference between what they would be willing to pay for the amount of the product purchased, as measured by the area under the demand curve, and what they do pay, which is price times quantity. So, in a normal market equilibrium, consumer surplus is the area under the demand curve and above price. Mathematically, at price p, if demand is QD(p) and inverse demand is denoted pD(Q): CS =

QD ( p )

∫

pD ( x)dx − pQ .

Remember CS is just an approximation. First, it relies on obtaining a reasonable estimate of a demand curve, which is difficult to do in the first place. Secondly, it assume consumers maximize [V(q) – pq], but this ignores income effects. Suppose you’d be willing to buy at most three burritos in any given week. If the cost of everything else you buy goes down, you can imagine you’d be willing to pay more for any particular number of burritos, because of your increased purchasing power. This model doesn’t take that into account, and so it’s not completely accurate, but it accurate enough for many purposes, and, is easy to use and allows us to gain many additional insights into the workings of markets. We need a similar measure for the benefits captured by the producer side of the market. Again, in a normal equilibrium, revenue received by the producers if simply price times quantity. Since the supply curve represents the marginal cost curves of the producers, the area under the supply curve S represents total variable costs of all producers. The area under price and above the supply e curve therefore represents revenue received in p PS excess of variable costs. This is producer D surplus, PS. This is illustrated in the figure on the right. Since producer surplus just adds up the area between the supply curve and the equilibrium price, in a normal equilibrium it can be expressed as the corresponding integral. Specifically, if p is price, S(p) is the supply curve, and pS(Q) the inverse supply curve: S ( p)

PS = p * S ( p ) −

∫

ps ( x)dx .

Since the supply curve represents firms’ marginal costs, producer surplus is closely related to profit. In the short run, profit is equal to producer surplus less any fixed costs that are sunk and also less quasi fixed (start up) costs that are not sunk

304

until output exceeds 0. So, in the short run producer surplus is profit plus those fixed costs. That is:

PS = ∑ i (π i + Fi ) . In the long run, profits are 0 in equilibrium. If the LRIS is flat, that is, if the industry is constant cost, producer surplus is 0 to. This is shown in the figure to the right of this paragraph.

LRIS D

However, in an increasing cost industry, producer surplus is positive, as shown in the next figure. How can that be if profit is 0, which it must be? Recall that in an increasing LRIS cost industry, either less efficient firms enter as the industry grows, or else competition pe between firms drives up input prices, or, both. PS The surplus is actually being captured either by D the shareholders of the “more efficient” firms or by the people who own the scarce (differentially more productive) factors of production employed in the industry. Producer surplus in an increasing cost industry is shown in the next figure. Imagine a wheat farmer working a particularly fertile piece of land. If demand grows and price increases, the price of the land goes up. The owner of the land captures that profit. If the farmer or agricultural firm owns the land, they capture the producer surplus as rent. If they lease the land from someone else, the landowner captures it. The farmer or firm does not make any economic profit. But, the owner of the land collects a higher economic rent. Thus, the distinction between “less efficient firms” entering and the entry of firms driving up the prices of scarce factors of production is somewhat semantic. What some would call more efficient firms are simply the ones whose shareholders own the rights to some differentially more productive factors of production. The rights to these become more valuable as the industry grows. They may represent more productive land, rights to a proprietary production process, or even a long‐ term contract with a particularly efficient management team. But, producer surplus reflects the appreciation of the value of the rights to these productive assets, not economic profit to the firm – economic “rents,” not economic “profits.” Like consumer surplus, producer surplus is a useful approximation, but it has its imperfections. The presence of quasi fixed costs are one failing of producer surplus in the short run. Fixed costs that are already sunk are just water under the bridge, and, should be ignored. So, imagining that producers are better off to the extent that revenue exceeds variable costs makes sense. However, the quasi fixed costs are not sunk unless the firm produces, but while they are reflected in a short run average variable cost curve, they will not be reflected in a marginal cost curve (mathematically, for those interested in such things, the total cost curve is discontinuous at 0 when there are quasi fixed costs in the short run). That does not

305

come up in the long run model because LRIS depends on average cost, not just marginal cost, and, average cost includes the quasi fixed costs (which are possible even in the long run). The other imperfection with producer surplus in the short run is that we are thinking in terms of the marginal costs faced by the firms under the current situation, current labor contracts, etc… But, some of the inputs will be earning economic rents. Those are not truly costs, but, rather, represent “surplus” received by owners of the factors of production that are differentially more productive than the marginal unit. That surplus gets ignored in the standard treatment of producer surplus in a short run model. Now that we’ve defined CS and PS, we define total surplus to be the sum of the two: Qe

TS = CS + PS =

∫p

( x)dx − p Q + p Q − e

∫p 0

Qe S

( x)dx =

∫(p

( x) − pS ( x) ) dx .

What outcome would maximize value added? Producing right to the point where the marginal value of another unit equals its marginal cost. The demand curve p S represents marginal value, and the supply curve represents marginal cost, so to maximize TS, produce where supply equals demand – that is at the competitive market equilibrium, Q* in the figure to the D right. Moreover, in a competitive market, production takes place at minimum long Q run average cost, so, there is no cheaper Q* way to attain the given output. In general, competitive markets allocate resources efficiently, tending toward this equilibrium quantity where TS is maximized. There are a lot of underlying assumptions. A big one is that the only ones who receive benefits from the products are consumers and the only ones who pay costs are the suppliers. If third parties outside the market transactions are helped or harmed as a result of what goes on in the market, that is if there are external or spillover benefits or costs, supply and demand no longer represent the true marginal value and marginal cost. For example, if an industry causes pollution, it’s easy to see how other people could effectively be paying the costs that are outside our classification of consumer and producer. Another is that the allocation is “efficient” GIVEN the existing distribution of wealth – the analysis above said nothing about whether or not that initial distribution was a desirable one. Nonetheless, this tendency toward maximizing value added, or social welfare conditional on the given initial distribution of wealth, is why the free operation of supply and demand in competitive markets is the standard by which markets are judged.

306

Chapter 17 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Constant Cost Industry An industry in which firms are identical and their marginal cost is constant. As this type of industry expands, input prices are unaffected and the equilibrium price never changes. Increasing Cost Industry An industry in which new firms are not as efficient as exiting firms, driving up long run average cost for the last firm, causing higher equilibrium prices. Some markets are so big that they drive the price of their own inputs up as they expand.

307

Chapter 18 Applications of Supply and Demand Analysis We know that in equilibrium, demand D(p) equals supply S(p). Quantity demanded can be written as the following

QD = D(p, m, PRC, nC, zC)

which says demand depends on price (p), income (m), the prices of related goods, such as substitutes and complements (PRC), the number of consumers (nC), and anything else unaccounted for (zC). Based on our model of supply, we can write quantity supplied as the following

QS = S(p, PI, PRP, nF, zS)

which says supply depends on price (p), input prices, such as wage and capital rates (pI), the price of goods related in production, such as assembly lines that can be changed from producing one good to another relatively quickly (pRP), the number of suppliers (nF), and anything else unaccounted for (zS).

Basic Comparative Statics Since we know supply and demand depend on all of the above factors (and more), we can ask P S’ what happens to each if one of the factors S changes. For example, suppose we’re looking at P2 the market for construction services. If, for some P1 reason, fewer illegal aliens are available to work on construction jobs, labor costs will rise. This D increase in costs will shift the supply curve for Q Q2 Q1 the construction industry left, and the simple shift shown in the figure to the right tells us price will increase and quantity will decrease. Essentially, eliminating the strategic interactions by assuming all firms are price takers allows us to use very simple graphs to see the overall effects of a change in any of the variables that influence supply or demand.

308

Impact of a Tax Suppose we start off with a constant cost industry, and impose an excise tax of $t per unit on producers. If pD is the price paid by demanders and pS is the price received by suppliers:

pD = pS + t .

DWL

Since producers pay the tax, LRIS+t we can think of the tax as an p0+t Tax revenue LRIS increase in costs of suppliers, p0 causing an upward shift in the LRIS curve, shown in the D figure to the right. The revenue raised by tax is Q0 Q1 Q simply t dollars per unit, times the Q1 units that are sold after the tax is instated (or the area of the rectangle). From a social perspective, the tax revenue is just a transfer of wealth, and doesn’t cause any loss in surplus. However, due to the fact that every unit now costs p0+t, consumers will buy less. The value of the last unit (the Q1th unit) is MC+t, and so this unit still has a value that exceeds the actual cost of producing it; but the tax means it is not purchased. The tax presents the production and consumption of the last (Q0 – Q1) units that would have otherwise been sold from being transacted in the market, and, these are valued above their cost. Thus the tax creates a deadweight loss – the triangle in the figure. It represents surplus that someone used to get (consumers in this example) that vanishes due to the tax. That represents a net loss of social welfare. Depending on what statistics you look at, $1 of government revenue costs taxpayers $1.20 ‐ $1.30, exactly because taxes destroy some social surplus. Theoretically this means that you should only use tax revenue to fund public ventures that have benefits 20‐30% greater than their cost, since it costs an extra 20‐30% just to raise the revenue. (This is quite aside from arguments about fiscal stimulus spending when/if monetary policy is insufficient in a severe recession – but that is a topic for a class in macroeconomics.) Now let’s look at an ad valorem tax, and further lets suppose we have an increasing cost industry (or that we are looking at the short run impact). An ad valorem tax is just one that’s proportional to value; so a typical sales or property tax is an example of an ad valorem tax. If pD is the price paid by demanders and pS is the price received by suppliers, and t is the tax rate, the three are related by the following equation:

pD = (1 + t ) pS .

To find the equilibrium price(s) and quantity after the tax, simply use that relationship to substitute into the demand curve, and then equate supply and demand. That is:

309

D( pD ) = D ( (1 + t ) pS ) = S ( pS ) .

Since the tax is a percent of supplier price, the after‐tax supply curve can be seen as an upward rotation from the original supply curve, as shown in the figure. The first thing to note is that compared to the old price, the price demanders pay Tax Revenue LRIS×(1+t) doesn’t go up by the full amount of the tax, and the price suppliers receive p doesn’t go down by the full amount of pD LRIS the tax. The difference between pD and pS is the value of the tax, but how it is split pOLD DWL up between suppliers and demanders pS depends on the sensitivity of supply and D demand to price changes (on elasticity of Q QNEW QOLD supply and demand). The only time when consumers strictly pay the old price plus the entire tax is when we’re looking at a constant cost industry, where the supply curve is completely flat, or, if the demand curve is vertical. The intersection of the upward pivoted supply with demand determines the price demanders pay (pD) and the new equilibrium quantity (QNEW). Suppliers receive that new price less what they pay as tax pS = pD / (1 + t ) . The gray rectangle has a height of pD–pS which is the value of the tax (the difference between what demanders pay and what suppliers receive), and a length of QNEW, the quantity sold after the tax; thus, the area of the rectangle represents the tax revenue. The area of the rectangle that falls above the old price (POLD) can be thought of as the portion of the tax paid by consumers, and the area that falls below old price can be thought of as the portion of the tax paid by suppliers. The area of the triangle labeled DWL is deadweight loss.

Impact of a Subsidy or a Price Floor Now we’re going to look at an example where the government provides subsidies to an agricultural industry. The first figure represents the market before the subsidy. In the figure, short‐run supply (SS) is less elastic than long‐run supply (LRIS). This is because in the short‐run, firms have the presence of a fixed factor, which limits their ability to SS respond to price changes. In the p long‐run, producers can enter or LRIS exit the industry, increasing their capacity for production and their ability to respond to price changes. Similarly for consumers, if the price of gas goes up, there’s DL DS not much they can do about it in the short‐run and thus the Q quantity demanded won’t drop as

310

much; in the long‐run, consumers can buy more fuel‐efficient cars, find alternative ways to work, etc. and as the number of substitutes for gas increases, demand for gas becomes more elastic. Now suppose the government imposes a subsidy of $s per unit on the market. A subsidy means the government is providing a certain amount of money per unit to suppliers or consumers. Let’s assume it is paid to suppliers for this example, in which case we can interpret the subsidy as a downward shift in supply (reduction in cost). Note that this affects both short‐run and long‐run supply. Initially, quantity demanded increases from initial (Q0) to short‐run with subsidy (Qs). The price paid by demanders goes down after the subsidy (pDS). Just like in our last example, to find the price suppliers have to receive in order to produce QS SS units, look up from QS to the P original supply curve; we see SS ‐ $s LRIS that the price suppliers receive S goes up to pSS. The cost of the PS LRIS ‐ $s subsidy in tax revenue needed to P0 support it in the short‐run is the PDS cost of the subsidy per unit ($s, DL which is the difference between DS S S pS and pD ) times the new short‐ Q Q0 run quantity after the subsidy S L Q Q (QS), or the area of the rectangle in the above figure. When we look at the quantity demanded in the long‐run (QL), we see that it is higher than in the short‐run. This is because suppliers and demanders are more price sensitive in the long‐run, so DWL the changes in quantities brought P LRIS about by taxes and subsidies are LRIS ‐ s bigger in the long‐run than they are PS P 0 in the short‐run. The cost of the P D subsidy in tax revenue and the additional deadweight loss it creates in the long‐run are shown in the Q Q0 Q1 graph at right. We know that the optimum quantity is where supply crosses demand, or Q0. We see the subsidy causes more units than this amount to be produced, and so the deadweight loss is due to over‐production. Thus, when we’re looking at an example with a subsidy, deadweight loss occurs because the last Q1 – Q0 units that were produced had a higher marginal cost than they did marginal value. The DWL is illustrated accordingly in the above graph as the difference between the supply curve and the demand curve to the right of equilibrium quantity (the area of the triangle). The cost of the subsidy itself in tax revenue needed to fund it is the gray

311

rectangle, which is just the subsidy amount per unit ($s, or pS – pD) times the quantity sold after the subsidy (Q1). Another way the government encourages higher prices to suppliers in certain industries is by a price support. A price support is a mandated price that’s higher than the equilibrium price. We know that at prices higher than equilibrium there’s a surplus; so basically, the government promises to buy up any surplus created by the price support. Note that a price support is the same thing as a price floor, since P the government is saying price can’t go surplus below a certain point. We see with the S floor quantity demanded is QDF and P SUP quantity supplied is QSF, so there’s a P0 surplus. The amount that it costs for the government to buy up the surplus is just D the price being charged (the price Q support level, pSUP) times the quantity of QDF Q0 QSF the surplus (QSF ‐ QDF), or the area of the rectangle. The deadweight loss is the area of the rectangle, minus the triangle on top that falls above the supply and demand curves; this is because below the demand but above supply is surplus value that could have gone to consumers and producers if price were at equilibrium, and the area below supply is the cost of producing the surplus of units that are of no value to anyone in the end. The rest of the rectangle is just part of the transfer of wealth from taxpayers to firms. The question of what is actually done with the surplus that the government buys is tricky. Imagine the price support is instituted on the wheat industry. If the government buys up a surplus of wheat, it’s tempting to want to say that they could just give it to charity; the problem is, if they give it to charity, the charities are buying less from the markets, driving price down and causing more of a surplus. What usually happens, then, is the government either pays the producers not to grow it in the first place, or they store the surplus until it rots.

Price Ceilings and Price Gouging Laws Suppose we’re looking at the market for some scarce resource, such as ice, that gets impacted during a natural disaster. If a hurricane hits, the p demand for the good will increase, S’ pD shifting the demand curve right; S however, since the hurricane has p’ knocked down power lines and Price Ceiling p0 closed off roads, it’s harder for D’ suppliers to transport or produce D ice, and so supply is reduced, Q Q0 Q’ QS QD shifting the supply curve left. It’s not clear what will happen

312

to quantity sold, but it is certain that equilibrium price will increase. What happens sometimes is that people will complain about increases in prices during disasters, and the government may step in and tell suppliers that they can’t raise price. If this is the case, and p0 becomes the price ceiling, at new levels of supply and demand there will be a shortage (QD – QS). The argument in favor of the price ceiling is that even though quantity supplied is lower than it would be if price were allowed to increase, the price is kept down and thus people of all incomes can afford the commodities they need most during disasters. In other words, if price were allowed to increase, the argument goes, only the rich would be able to buy ice. The problem with this is that the task of rationing the limited quantity of good falls on the firms, and they have to put people in lines to determine who gets what, which results in many people, who would be otherwise willing to pay a higher price, unable to purchase what they need. Q’ would be new equilibrium quantity if price were allowed to rise to the new equilibrium level, p’ (note, Q’ could be higher or lower than Q0; in this picture, we’ve drawn it as being higher). The price that demanders would be willing to pay for another unit given the shortage is pD. So, the “standard” deadweight loss is just the value that would have been created if price was allowed to increase, and more units were sold, or the area of the triangle in the graph. The shaded rectangle to the left of the triangle may be deadweight loss as well. This is because with a price ceiling, people are still willing to pay pD. So, they will use their extra resources to try to increase their likelihood of obtaining the limited goods. This could be in the form of bribery, hiring someone to wait in line for you, etc. When taking this extra cost into account, the actual price that customers pay actually approaches pD, but, potentially only p0 goes to producers – the rest may largely be wasted (time spent standing in a line instead of doing something productive). If this is the case, the shaded rectangle could be considered deadweight loss as well. Since we’ve just shown how taxes, subsidies, price floors and price ceilings all create deadweight losses, it seems that there’s no reason they should exist. However, in general, economists won’t mind taxing industries with spillover costs (such as pollution) or subsidizing industries with spillover benefits, as this actually causes the amount in production to approach the optimum.

313

Chapter 18 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Consumer Surplus The benefit received by consumers who can buy a product for less than their willingness to pay for it. Approximately, it is the triangular area under the demand curve and above the market price. Producer Surplus The benefit received by producers for selling goods at a certain price. In essence, it is the markup that they are charging on their products. Approximately, it is the triangular area above the supply curve and below the market price. Ad Valorem Tax A tax charged to suppliers as a percentage of the price they pay to make a product. This type of tax would increase the slope of the supply curve. PerUnit Tax A tax charged to suppliers per unit. This type of tax would cause the supply curve to shift left, i.e. decrease supply. Tax Revenue The amount of money received by the government created by implementing the tax. Deadweight Loss The burned up value that occurs when the amount of resources devoted to an industry is not optimal, i.e. when the consumer and producer surplus is not maximized. Price Floor An inefficient minimum level above equilibrium price that can be charged for a product. The government usually imposes this because equilibrium price is considered too low. The government will decide to buy any surplus created by the floor. Price Ceiling An inefficient maximum level below equilibrium price that can be charged for a product. The government usually imposes this after a disaster to prevent price gouging. The argument in favor of this type of intervention is an increased fairness.

314

Chapter 19 Market Structure Wrap Up

Summary of Models The figure below provides a visual summary of the models we covered in the last four chapters. At the far left, we have the simplest, monopoly. A monopoly’s product is so differentiated from its nearest competitor that its pricing and output decisions have no impact on any other firms, and no other firm’s decisions have any impact on it. Of course, it is hard to imagine any firms whose market power is that extreme in reality, but, it provides a very useful theoretical benchmark. Assuming the monopoly generates economic profit (which is not a sure thing – who wants a monopoly on 8‐track cassette manufacturing), other firms will want to enter the market to claim their share of the pie, if entry is possible. Three kinds of things may stand in the way of entrants – legal barriers, the incumbent’s strategy, and sheer economies of scale. Broadly speaking, legal barriers would include patent protections, licensing requirements, and any other form of regulation, tax, or government policy that makes it hard for a new competitor to enter a market. Possible incumbent strategies – like entry‐limit pricing – we will cover later. Homogenous Products: Capacity limits quantity comp, excess capacity fierce price comp, cost management accentuated, advertising at industry level Monopoly

Oligopoly: Reaction functions slope down in quantity comp, price comp can yield π=0 with 2 firms Fewer Entry Barriers

Mean More Firms in LR where π≈0

Differentiated Products: Price or quantity comp (depends on determinants of capacity limits), firm level advertising accentuated, cost management muted

Oligopoly: Reaction functions slope up in price comp, down in quantity comp

Perfect Competition Supply and Demand

Monopolistic Competition

If economies of scale are so large that if one firm is in the market it makes a profit but if a second enters the market both firms make losses, the industry is characterized by natural monopoly. Traditionally, many utilities are examples of natural monopoly. For example, the electric power industry, at least on the

315

distribution end, is taken to be a natural monopoly. The reason is that it is so expensive to put in place the infrastructure for electricity distribution that it would never make sense to operate two parallel distribution grids at the same time. This is one reason why utilities tend to be regulated – there is little room for market competition to bring prices down closer to marginal costs, and, thus to bring the equilibrium output close to the socially efficient level. Ignoring legal and strategic barriers, if economies of scale are not too severe, more firms enter. If the entering firms produce products that are essentially identical, we have a homogenous product oligopoly (moving clockwise along the top of the figure). If capacity is essentially unlimited, in that each firm has enough capacity to meet any possible quantity demanded, price competition between two firms can be so fierce that profits are driven down to zero, and the firms split the market. Logically, it would then make no sense to build the capacity needed to serve the whole market. So, with homogenous products, the strategic decision has to do with capacity, which indirectly determines price. The fact that products are homogenous means firm strategy has more to do with cost reduction than with stimulating demand. Indeed, the strategic effects of a cost reduction in quantity competition magnify its benefits, while the benefits of any firm level advertising spill over to all firms. Thus, firms in truly homogenous product industries tend to advertise cooperatively through trade councils and industry associations, not individually and in a competitive manner. As long as additional firms can expect to earn profits by entering, they will continue to do so. In the long run, no additional firms could hope to earn a profit if they entered. If economies of scale are such that the industry can support only a small number of firms, strategic considerations will remain important. Indeed, a way for a firm or an entrepreneur to make a profit, at least for some time, is to find a more efficient way to fill the needs of customers, reducing costs and incurring the direct and strategic benefits of that. But, market forces simply mean these profits will be eliminated over time. On the other hand, if economies of scale are very small relative to the scale of market demand, many firms can enter. As they do so, the market share of any individual firm gets small. As this happens, the effect of any one firm’s decision to sell another unit of output is spread over more and more firms. So, marginal revenue grows ever closer to market price. In the extreme, marginal revenue is exactly equal to price when market share goes to zero. No firm has a market share of 0, but, in practical terms, the strategic effects of one firm on others become trivial as market share gets small. So, if we want to consider the performance of a market as a whole, it makes sense to treat firms as “price takers” and thus ignore complications arising from strategic effects when the number of firms is “reasonably” large. What constitutes “reasonably” large is in the eye of the beholder and depends on your judgment, given the needs of the question at hand. Industries of price taking firms are termed perfectly competitive.

316

On the other hand, if each entrant produces a slightly different product from the other firms in the industry, we need models of differentiated product industries. Since the products are differentiated, entrants will not take all profit from incumbent firms simply by undercutting their prices a bit. That softens price competition – even when there is excess capacity, firms may make economic profits if the number of competitors is limited. In that case, strategic decisions may involve wither price or quantity. In markets where lead and lag times are short, so that it is easy to expand or contract production sizably on an instant’s notice, it may make the most sense to think about these firm’s engaging in price competition. In markets where lead and lag times are significant, and in which it takes a long time to ramp production up or down, it makes sense to think in terms of quantity competition – with capacity being the strategic variable. With differentiated products, the role of cost management is muted. If a firm can’t capture as many of an opponents customers by undercutting their price, there is less reason to invest in cost reduction. Further, the opponents will respond by lowering their price as well, strategically offsetting part of the direct benefit of the cost reduction. On the other hand, the role of advertising is enhanced. This might be advertising to inform customers of a product’s existence and characteristics, to persuade customers to switch, or, to build brand loyalty and thus prevent customers from switching. Such advertising might emphasize differences that are potentially important to customers, or, exploit human foibles and induce customers to prefer one product over another for no reason embodied in the product itself. When advertising is thrown into the strategic mix, the picture becomes much more complex. That is because both price and advertising affect demand and therefore marginal revenue. So, when one firm advertises more, their reaction function shifts up – they will charge higher prices. That leads the other firm to raise their price as well. However, the change in advertising might also decrease the other firm’s demand, shifting their reaction function. To correctly analyze the situation, it is necessary to simultaneously determine price or quantity and advertising in the Nash equilibrium. While the complexity of that problem renders its solution somewhat beyond the scope of the class, the important thing to take away is that decisions about price or advertising should be made jointly, and, that changes that lead one firm to advertise more or less will induce changes in both firm’s prices and the other firm’s advertising. Firms will continue to enter as long as they can do so profitably. When economies of scale are such that more than a few firms can enter, the market structure may be referred to as monopolistic competition. Since products are differentiated, entry means an increase in product variety. As there are more and better substitutes, each firm will face a more elastic demand. As the product space becomes more densely packed, the firms have less and less control over their own price. In the limit, marginal revenue approaches price as the number of firms becomes very large. So, when economies of scale allow the number of firms in a differentiated product market to become large, those firms essentially become price takers.

317

Thus, whether entry occurs in the form of identical product (the top of the figure) or differentiated (the bottom of the figure), if enough firms enter, the firms become price takers in the limit. When firms are taken as price takers, we think of them as perfectly competitive. Once we decide to just treat firms as perfectly competitive, it becomes quite simple to relate the conditions facing a representative firm market supply, and, to relate the interaction of market supply and demand to the decisions of a representative firm. That gives us a simple yet powerful tool to structure their thinking about market outcomes. Generally, as markets become more competitive, prices fall, as do profits. Firms may wish to undertake strategies to restrict the level of competition, or, at least, to soften it. On the other hand, as markets become more competitive, output increases, as does value added in the form of consumer and total surplus. Absent externalities of one sort or another, and ignoring concerns about the initial distribution of wealth, outcomes in a competitive market are socially efficient, in that everything with a marginal value in excess of its cost is produced, and, in the long run, everything is produced at the minimum long run average cost, so there is no excess capacity. Thus, government rules and regulations and antitrust policies ostensibly aim to promote pro‐competitive practices and to curb anti‐competitive practices and to smooth over market imperfections. In order to gauge the competitiveness of markets, it is necessary to measure markets and market power. Concentration ratios are one way to measure how competitive an industry is. Cm is the total market share of the largest m firms in an industry. The idea is that the larger the ratio, the more concentrated the industry, and, the less likely it is to be highly competitive. If zi is market share, and the n firms in the industry are indexed in descending order of their market share, so i=1 is the largest firm and i=n is the smallest firm: m

Cm = ∑ si . i =1

C4, which is the most commonly used, measures the total market share of the largest 4 firms. Simple concentration ratios are limited because they say nothing about the other firms in a market. If the largest four firms hold 60% of a market, it may matter a great deal whether the next two largest firms have a 30% market share or a 5% market share. A Herfindahl Hirschman index is an index that measures the concentration of all firms in an industry. The formula is n

HHI = 10000∑ si 2 .

Firm 1 2 3 4 5 6 7 8 9 10 Total

100s 20 20 10 10 10 10 10 4 4 2 100

10000s2 400 400 100 100 100 100 100 16 16 4 1336

i =1

It is multiplied by 10,000 simply to get rid of the large number of decimals that arises when squaring small market shares, e.g. 0.012=0.0001. If an industry were a pure monopoly, its HHI would be 10,000. If every firm’s market share were zero

318

(perfect competition), HHI would be 0. The table above to the right shows a sample calculation for a hypothetical 10 firm industry. Concentration measures for selected U.S. industries are shown in the table below. Concentration Measures for Selected US Industries Industry C4 HHI Industry C4 Fluid milk 4 101 Men's & boys' neckwear 6 Soft drink 4 710 Printing 1 Breakfast cereal 8 300 Pen & mechanical pencil 7 Bread & bakery products 4 581 Ready‐mix concrete 1 Sugar 5 856 Dental equipment & supply 3 Distilleries 7 209 Basic chemical 1 Furniture & related 1 57 Battery 5 Wood container & pallet 7 26 Petroleum refineries 4 Paper mills 5 883 Automobile 8 Corrugated & solid fiber 3 392 Boat building 3 US Census Bureau, http://www.census.gov/epcd/www/concentration.html

HHI 140 48 195 57 437 160 958 809 275 573

There are some inherent problems with using these ratios to measure competitiveness. First, just defining the market can be very problematic. Geography comes into play. For example, if you’re trying to measure the concentration of the distillery industries and you leave out imports, you’re going to overstate the concentration of the industry. If you’re looking at the ready mix concrete industry, you must look city by city, since suppliers in New York won’t ship ready mix concrete to Florida, due to basic logistics. The numbers in the table above are flawed for both of these industries, since they include only US firms, but, include all US firms. Thus, the more closely defined the substitutes are for a good, the better the representation of the concentration ratio. Both time and intended use are characteristics that need to be met in order for a product to satisfy a given customer. An even bigger problem with these concentration ratios is that they ignore strategic interaction – that is they ignore the nature of the competition between the firms! We’ve seen models where there are two firms competing in homogenous competition, and each has unlimited capacity, price competition drives profits down to zero. The HHI for this industry would be 5,000, which would imply that it is highly concentrated, but the firms are just breaking even. It would seem better, then, to look at more direct measures of market power. The most direct measure is simply the difference between price and marginal ⎛ η ⎞ cost, or, the mark up factor. Recall p* = ⎜ ⎟ MC , where η/(1+η) is the markup ⎝ 1+ η ⎠ factor. For a perfectly competitive industry, the markup factor is 1, or, price equals marginal cost. To the extent that this factor increases, the market becomes competitive.

319

The biggest problem with this method is that it is almost impossible to measure short run marginal cost at a specific point in time accurately. First of all, it can be hard to disentangle which costs are actually attributable to production of the marginal unit. Second, we must measure the full economic cost of production, including a normal (risk adjusted, after tax) return on capital. Often, about the best we can hope for is that average cost is close to marginal cost and that the accountants’ measures of total cost come close to capturing economic cost. In that case, total accounting cost divided by output may be a reasonably good proxy of marginal cost. A 1987 working paper by Matthew Shapiro provides the most comprehensive attempt to measure market power in US industries. He undertook an econometric analysis of data from the U.S. National Income and Product Accounts. Not only is the work just over 20 years old (at the time of this writing), but, the assumptions involved in the econometrics are extremely tenuous – constant demand elasticity at the firm level, constant returns to scale, constant share of labor in total cost. But, it is still the best comprehensive set of estimates available. For any individual case, it would be better to rely on accounting data specific to the firm(s) involved (which may be proprietary and would not generally be easily available other than as a part of some legal proceeding). But, our purpose is to paint the U.S. market in broad strokes, and for that Shapiro’s paper is the best available. Calculations based on his research are shown in the table below. Market Power Measures for Broad US Industries 19491985 Demand Elasticity Mark Up Firm/ Factor Firm Market Market Industry Agriculture 1.01 ‐96.2 ‐1.8 0.02 Construction 1.24 ‐5.2 ‐1 0.19 Durable Manufacturing 1.40 ‐3.5 ‐1.4 0.40 Nondurable Manufacturing 1.42 ‐3.4 ‐1.3 0.38 Transportation 2.11 ‐1.9 ‐1 0.53 Communication and Utilities 2.25 ‐1.8 ‐1.2 0.67 Wholesale Trade 2.67 ‐1.6 ‐1.5 0.94 Retail Trade 2.25 ‐1.8 ‐1.2 0.67 Finance 1.22 ‐5.5 ‐0.1 0.02 Services 1.04 ‐26.4 ‐1.2 0.05 Shapiro, Matthew D., "Measuring Market Power in U.S. Industry, NBER Working Paper No.2212, 1987

The first column of the table above shows Shapiro’s estimate of the firm level profit‐maximizing mark up factor. Not surprisingly, it looks like agriculture and services are highly competitive, with a mark up factor essentially equal to 1.

320

Communications and utilities show considerably more market power (though the industries were regulated, so, it is hard to know what to make of that). Transportation and wholesale and retail trade also showed considerable market power over the period in question. With declines in transportation costs and the rise of the internet, both of which make it easier to access substitute products, it is almost certain that market power in those industries is lower now than it was on average over the time period of Shapiro’s sample. Construction and both durable and non‐durable manufacturing were intermediate cases – some market power, but, not much. The mark up factor on its own lacks an upper benchmark. It would be 1 for perfect competition. But, with a pure monopoly, the mark up would depend on the market level elasticity of demand. Shapiro and others have suggested comparing the elasticity of demand faced by a firm (ηFirm) with the elasticity of demand for the market as a whole (ηMarket) by dividing the market elasticity by the firm level elasticity, (ηMarket/ηFirm). A perfectly competitive firm faces a demand elasticity if ‐∞, so the ratio would be 0. A monopoly faces the whole market, so the ratio would be 1. The ratio thus has a neat upper and lower bound corresponding to monopoly and competition. There are two additional (related) problems, though. While the firm level demand elasticity can be inferred from the mark up factor, we have to obtain a measure of the market demand elasticity. That requires a lot of data and provides more places to go wrong. Second, with differentiated products, it is difficult to even define what “market” demand is. All that really exists are separate but related demands for the different but related products. The best interpretation is something like the average percentage decline in purchases when ALL prices go up by 1%. That, obviously, is a hard thing to infer from any actually observed data. Shapiro, however, provides econometric estimates of this additional parameter as well. The last column of the table above presents his estimates of (ηMarket/ηFirm). Wholesale trade was quite monopolized over the sample period, as were retail trade and communications and utilities. Communications and utilities were largely regulated natural monopolies at the time. Wholesale and retail trade likely had monopoly power due to a combination of economies of scale and geographic isolation. Most likely, they are far more competitive now. Agriculture, construction, finance, and services appear quite competitive, overall. Any attempt to quantify market power will be imprecise. If it is based only on observed market share and concentration indices, it will be plagued by problems of market definition and the fact that it ignores the nature of competition. If based on observed mark ups, it will suffer from the difficulty of measuring actual economic marginal cost at any particular point in time. Estimates of concentration and mark up are important clues about market power, but, are not sufficient in and of themselves to provide evidence of either market power or anti‐competitive practices. It is also important to consider things like the availability of substitutes, the presence or absence of economies of scale and excess capacity, and, other potential barriers to entry.

321

Antitrust Law and Policy Since perfectly competitive markets allocate resources efficiently, the U.S. government, through the Department of Justice, (DOJ) uses antitrust laws to restrict anti‐competitive practices. What follows is by no means an exhaustive or precise consideration of antitrust law or policy. It is merely a rough pass at some of the larger issues. The US Department of Justice considers industries with an HHI above 1800 “highly concentrated” and gives special attention to merges and other practices in such industries. It also pays special attention to mergers that cause the HHI to increase by over 100. These are benchmarks for attention, not hard and fast rules about what merges to allow or when to break up a large company. Market definition plays a crucial role in making such judgments, and, many economists and other consultants are employed as experts in litigation to ague over exactly which products are substitutes for the products of firms under DOJ antitrust scrutiny. U.S. antitrust policy depends greatly on the “Rule of Reason” articulated in the 1911 U.S. Supreme Court ruling in Standard Oil Co. of New Jersey v. United States, 221 U.S. 1. The ruling held that only unreasonable restraints on trade are subject to actions under U.S. antitrust laws, and that market power in its own right is not illegal. Four our purpose, that means things are taken on a case‐by‐case basis. For example, a merger that increases concentration and mark up, but, allows the new larger firm to exploit economies of scale and scope and therefore to actually offer lower prices to consumers would not be blocked simply because it might increase market power by some measures. A contract or practice that may appear to restrain trade on its face may be allowed if there is some sound economic reason for it beyond simply restraining trade to boost profits at the expense of customers.

Anti and ProCompetitive Strategies Firms, however, are perfectly happy to restrain trade simply to boost profits. We regularly hear accusations in the press that some big firm or another is engaged in predatory pricing, for example. In this section, we consider a number of pricing strategies that have anti‐ or pro‐competitive effects. Whether or not pricing strategies that drive out competitors are in a firm’s interest, or even feasible, is not as straightforward as it might seem. And, when they are feasible and in a firm’s interests, they might or might be found to violate antitrust policy – that is an issue for a class in antitrust law or economics.

Preventing Entry with EntryLimit Pricing One of the tools a monopolist has in order to keep out potential competition is entry‐limit pricing. The idea is that the monopolist, when faced with possible entry, could lower prices to a point where the entrant cannot compete. That would keep the entrant out and leave the monopolist the sole firm in the industry. However, remember our discussion on credible threats. The only incentive the monopolist has for lowering price is to keep the entrant out. If the entrant actually does enter, the monopolist has no incentive to keep prices low. So, the entrant can

322

predict that if they entered, the rational monopolist would accommodate the entry. So, pricing below the profit‐maximizing level simply to deter entry is not a credible threat that post‐entry prices will remain low. Entry‐limit pricing is only feasible if the monopolist can commit to the threat of keeping prices low after entry. The goal of the of entry‐limit pricing is to credibly commit to a price and output level that leaves the entrant’s residual demand LRACEntrant below his long‐run average cost curve, as in the figure to the right. If he can reasonably commit to a strategy that leads to the above outcome, the entrant will never be profitable and thus won’t dEntrant enter. How can the monopolist make his threat credible? One way is to reduce marginal cost, so the profit‐maximizing price limits entry. This might be done through adopting a technologically advanced factory that has high fixed costs but low marginal costs, profit‐maximizing price (where MR=MC) will be low and quantity will be high. It might also be done by moving first and building a much larger capacity than would otherwise be profitable. This is related to they type of production technology adopted, trading a high fixed cost for a low marginal cost to deter entry. A first mover also has the chance to reduce marginal costs by moving so far along the learning curve. Producing more than would otherwise maximize profit early on can lead to opportunities to learn by doing that reduce cost and improve efficiency. An aggressive first mover may learn a lot about the industry and develop such an advantage in their distribution network, operating structure, etc. that the competition is never able to “catch up,” making entry unattractive. The problem is that even if you accomplish both of these things, your marginal cost may still not be low enough to keep out potential entrants. Even if marginal cost is 0, the profit‐maximizing price is positive. If that price is higher than the entrant’s cost, they will still have an incentive to enter. This is shown in the figure to the right.

p* D MC=0

q* MR q The next possibility is to develop a reputation as a fighter. In this case, the monopolist deters entry through his reputation, as opposed to a physical reduction in marginal cost. What makes the threat credible here is the present value of future payoffs from keeping entrants out; in other words, if this present value is high enough, all players will rationally believe that the monopolist will lower prices

323

to secure his reputation as a fighter (as long as the monopolist doesn’t violate antitrust legislation). Given a monopolist that has secured a strategy that is credible enough to keep entrants out, we now have to ask whether or not this strategy is a good one. To do this, we need to compare the expected present value of profits when we engage in entry‐limit pricing to the expected present value of profits without entry‐limit pricing, or

EPV (π | Limit ) > EPV (π | Accomodate) . If the profit from limiting is higher, it is a reasonable strategy. In a simple world, we can imagine a monopolist who limits entry and gets a lower profit (πL) this period and every period thereafter, versus one who doesn’t limit entry and gets a monopoly profit (πM) this period, and an accommodating profit (πA) every period thereafter, or

πL +

πL π > πM + A r r

where r is the interest rate. Comparing πM with πL, we know the entry‐limiting monopolist has taken steps, such as an investment in a plant with a very low marginal cost or a very high capacity, but a very high fixed cost, in order to secure a strategic advantage for the future of credibly keeping entrants out. Therefore, the per‐period profit of the monopolist that is attempting to limit entry will be lower than a monopolist that is simply maximizing profit (πM > πL). The deciding question now becomes how much higher is the entry‐limit profit than the accommodating profit (πL – πA). This will ultimately decide whether the strategic advantages are worth the costs of obtaining a reputation as a fighter or reducing marginal cost to the point that entrants are just not interested. That is, is the flow of future extra profits from limiting entry high enough to justify the loss of today’s monopoly profit: πL − π A > π M − π L . There is no reason to think this will necessarily hold, even when r entry‐limit pricing is feasible, which it may not be.

Eliminating Competitors with Predatory Pricing This is a tactic used to drive an existing competitor out of the market; however, it relates to entry‐limit pricing in the sense that the threat must be credible. Thus, temporarily lowering prices to drive out a competitor and then, once your competitor is gone, raising prices to monopoly levels will only attract new competition. For this to really work, it must be rational to lower prices today to push a competitor out of the market, but also to keep prices low enough in the long run to dissuade future entrants from entering the market. To determine if it is rational to use predatory pricing to drive out current competitors, simply compare the present value of profits using predatory pricing with the present value of profits without using predatory pricing. Regarding legality, it is not inherently illegal to price below your opponent’s cost. If you eliminate

324

competitors by having a lower marginal cost, that may even be good for customers. Having a reputation for pricing below your own cost to harm competitors, though, is grounds for more serious legal antitrust trouble.

PreEmptive Product Introduction in Differentiated Product Markets Unless there are barriers to entry, π > 0 leads to entry. When we first introduced differentiated products, there were two firms on opposing ends of a line that represented different levels of a certain characteristic of a good. In reality, there are many different characteristics that firms can use to differentiate their products. Imagine two different properties of a good, x1 and x2, and imagine each point in the following graph represents products of competing firms containing different amounts of each characteristic. Remember, each characteristic could be anything that affects customers’ preferences, such as sweetness, location, etc. Suppose the dots in the figure below represent actual products, your firm’s product is the light‐colored dot, and you are making a profit. The squares represent potential points where another product may be introduced that would attract enough customers to be profitable, but which would reduce your firm’s profits. That is, you are vulnerable to entry in the market niches corresponding to the squares. One way to combat this potential threat is to consider preemptive product introduction. In other words, you could sell new products that would take some of the demand away from your initial product, in order to keep other competitors out of the market. Basically, product positioning is a very strategic decision that can be used to keep other competitors out, as well as keep demand focused on your most profitable product. x2

Niches vulnerable to entry

Softening Competition with a Price Match Guarantee This is where a firm guarantees that, on a similar product, any lower price charged by a competitor will be matched, and possibly lowered by some additional amount. On the surface, this seems like it protects customers that are buying from the firm with the price match guarantee; however, this actually serves to insulate price competition. Suppose two firms each have price match guarantees, and imagine they are selling homogenous products. Without the guarantee, each firm would want to undercut the other to take away market share, driving price down to

325

marginal cost. With the guarantee, neither firm has any incentive to undercut the other, since the original will just match that price; thus, each firm leaves prices at monopoly level. Thus, price match guarantees effectively restrain competition, and are a mechanism firms use to keep prices high.

Limiting Entry by Raising Costs Another way to limit entry is to increase costs for the entire industry. One way to do this is by increasing fixed costs. Suppose there is a monopolist in an industry and his profit is πM = 100. If entry were to occur, both firms would make a profit of π = 20. Now imagine the monopolist goes to the government and says his industry contributes to global warming, and the government should charge every firm in the industry $21 per year. This is essentially an increase in fixed costs, since it’s not dependent on a firm’s output. The profit of each firm if entry occurs is now π = ‐1, so the entrant does not enter. The monopoly profit becomes πM = 100 – 21 = 79, which is better than 20, so this is a rational strategy to keep potential entrants out. You can also attempt to increase variable costs for the entire industry. Generally you will lose more profits (due to the higher costs) when compared to simply raising fixed costs. As the graph illustrates, increasing marginal cost will result in higher prices, meaning a deadweight loss for the industry and lower consumer surplus (due to the higher prices). So, when limiting entry by raising costs, it usually makes more sense to raise fixed costs rather than raising variable costs.

p p1 p0

MC1 MC0 MR

q1 q0

Pricing Below Cost to Promote Competition Penetration Pricing This pricing strategy is used to “penetrate” a market, typically when introducing a new product. A supplier will charge an extremely low price, offer the product for free, or even pay customers to try a very limited amount of their product to create awareness in a market. This strategy is particularly effective when the incumbent firm has some sort of a “lock‐in,” which is when a large group of customers are already loyal to a particular product. A good example would be if a software company that produced a substitute for Microsoft Excel were to pay consumers to use its product, in order to break into the market. While this is pricing below cost, since it promotes rather than hinders competition, it does not run afoul of antitrust rules.

Vertical Foreclosure Suppose there is a supply chain that has two downstream firms that compete with one another for sales of final products to consumers, and a single upstream firm that provides each downstream firm with a needed input. Now, if one of the downstream firms merges with the upstream firm, there will be a new vertically integrated firm, as well as the other downstream firm. The new firm can now

326

attempt to squeeze the extra downstream firm out of the market by marking up the price of the components they sell to them. This is known as vertical foreclosure – using market power at a different vertical point in the supply chain to reduce competition at a given horizontal location. Such behavior is an antitrust violation. But, it may still not be in the new integrated firm’s best interest to engage in this type of pricing anyway, because doing so may boost the downstream firm’s profits, but it does so at the expense of the upstream firm’s profit. If the upstream division sells components to both the downstream division and the competing downstream firm, it may actually be more profitable for the upstream division to continue selling to both entities. The best pricing decision depends on the strategic situation the firms are in.

327

Chapter 19 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Concentration Ratio A measure of how competitive an industry is. Only the largest firms are measured and compared to the industry as a whole. The larger this ratio is, the less competitive the industry. EntryLimit Pricing A tool used by a monopolist to lower prices to a point where an entrant cannot compete. This drives the entrant out, leaving the monopolist the sole firm in the industry, enabling him to again raise prices. This will be successful only when a credible commitment is made to maintaining low prices after the entrant enters. Herfindahl Hirschman Index An index that measures the concentration of all firms in an industry. Industries range from 0 (perfectly competitive) to 10,000 (pure monopoly). The U.S. Department of Justice considers an HHI above 1800 “highly concentrated.” Penetration Pricing A pricing strategy that is used to enter a market, typically by introducing a new product. Charging an extremely low price, offering the product for free, or even paying customers to try the product are ways to create awareness in the market. Predatory Pricing A tool used to drive an existing competitor out of a market by lowering prices to a point where that competitor cannot compete. This will be successful only when a credible commitment is made to maintaining low prices after the competitor exits. Price Match Guarantee A strategy used by firms in an oligopolistic industry to insulate price competition and keep prices high. The firms guarantee that, on a similar product, any lower price charged by a competitor will be matched. When all firms use this strategy, they will have no incentive to undercut each other and can set prices at the monopoly level. Vertical Foreclosure Occurs when there are two downstream firms and one upstream firm in a supply chain, one of the downstream firms merges with the upstream firm, and the upstream firm no longer purchases from the remaining downstream firm.

328

Part 6 Firm Structure

329

Chapter 20 Input Procurement and Contracting Ways to Obtain Inputs Every stage of the production process requires inputs. There are three main ways to get them: 1) buy them in the spot market, 2) make them, or 3) sign a contract with an input supplier. Each is appropriate in different situations. The spot market is just the free‐market economy in its purest sense; you go to the market to determine the market price of whatever input you need, and buy as many of them as you require. Using the spot market is the simplest way to procure an input. It has the major advantage of allowing the firm to focus more of its resources on producing the goods it will ultimately sell. Thus, using the spot market to obtain inputs keeps the firm more specialized. The spot market functions well as long as there is near perfect information, products are very standardized, and transactions costs are low. If all of these related conditions are met, supply and demand will be able to accurately represent the cost and value of the good, and the spot market will be able to facilitate trade efficiently. But, the spot market tends to break down if any of these three conditions are not met. The more differentiated an input is, or the higher the transactions costs of finding it, the thinner the market will be. These are related; a lack of standardization means a thin market in which much depends on search and negotiation, which means high transactions costs. Similarly, when information is incomplete, it may take a great deal of time just to gather enough information on which to base a decision, and when information is too asymmetric, every transaction can become a time consuming negotiation. A firm that vertically integrates several stages of the production process is internally creating some of the intermediate inputs needed for its final product. This gives the firm the most direct control over the production of the input. When the input needed by the firm is highly specialized and it is simply too hard to negotiate with an input supplier to provide it, this may be the best solution. But, it also diverts more of the firm’s focus from producing the goods it will ultimately sell to its consumers. So, the firm becomes less specialized. Signing a contract with an input supplier is an intermediate solution. The firm remains more specialized, but some of its resources are devoted to negotiating contracts and maintaining the relationship with the input supplier. The economic aspects of contracting with input suppliers, especially when faced with adverse selection or moral hazard, is the subject of the rest of this chapter. It should go without saying that this is not a course in contract law – but I feel it should be said anyway. The text provides only an overview of the economic aspects of contracting – when and how contracting can increase the profitability of the firm. How to actually negotiate, write, and enforce a contract is, of course, well beyond the scope of a class in managerial economics.

330

Contracting and Optimal Contract Length There are both benefits and costs to contracting, and specifically, to contract length. The major benefit of signing a contract is that it avoids the costs of search, bargaining, negotiation, and other transactions costs that would be incurred if the firm had to seek out an input supplier every time it needed more of the input, spell out in detail exactly what they need, and negotiate purchases one by one. With a contract, all those details are worked out one time in advance, and then, for the length of the contract, they are not incurred again. The higher those types of transactions costs, the higher the marginal benefit of extending contract length a bit. One of the major costs of contracting has to do with the complexity of the contracting environment. How many contingencies can be foreseen? How easily can what to do in each contingency be agreed upon? What must be done so the contract is readily enforceable in each foreseeable contingency? How expensive is it to pay for enough time for the firm’s managers, lawyers, etc… to actually negotiate the contract. The more complex the contracting environment, and, the more expensive it is to negotiate the contract, the higher the marginal cost of extending the contract length a bit. The other major cost of contracting has to do with the cost of being tied down in the future. This cost has primarily to do with raw uncertainty ‐ the unknown unknowns. To the extent that unknowns can be numerated and assigned rough probabilities, they can be dealt with (perhaps at great expense) in the contingencies spelled out in the contract. But, what to do in contingencies that can not be reasonably foreseen before hand obviously can not be spelled out in a contract. The firm might find being tied into a contract excessively restrictive if radically new technologies lead them to adopt a production process that no longer needs the input specified in the contract, for example. The more important this type of raw uncertainty, the higher the marginal cost of extending contract length a bit. The optimal length of a contract MB MC” balances the marginal benefits of & MC additional contract length against MC their marginal costs. This is shown in the figure to the right (L*). An increase in the transactions costs of obtaining the input without a contract increases the marginal MB’ benefit of contact length (to MB’) and increases contract length (to L’) given the initial marginal cost curve. MB An increase in the complexity of the contracting environment or the costs L” L* L’ Length of being tied into a longer contract in the face of uncertainty increase the marginal cost of contract length (to MC”) and decreases contract length (to L”) given the initial marginal benefit curve.

331

Contracting and Asymmetric Information Sometimes different parties to a transaction have significantly different information regarding certain aspects of the transaction – there is asymmetric information. The simplest example is if an individual wished to sell a box full of money, but only he (and not the buyer) knew how much was inside. The buyer could reasonably conclude that the seller wouldn’t be willing to sell the box for anything less than the money in it, and thus would never want to buy the box. This is because of the asymmetric knowledge the seller has. In general, there are two categories of asymmetric information. The first is adverse selection, which is when one party knows some characteristics about themselves that are hidden from the other. Insurance markets provide a good example of an adverse selection problem. Suppose an insurance company provides health insurance to two types of people, one that is high‐risk and one that is low‐ risk. In order for the insurance company to break even on expected losses, they must charge a premium somewhere in between what they expect to lose from both low‐risk and high‐risk types. Thus, it is possible the low‐risk types will determine they are being over‐charged by the insurance company to compensate for the high‐ risk types, and decide insurance is not worth buying. The company can reasonably predict this, and as a result they will charge the remaining high‐risk types a higher premium. This result is where the name adverse selection comes from ‐ because there are hidden characteristics about the people who desire health insurance, the company ends up insuring the people who are most likely to get sick and thus to file claims. The second category of asymmetric information is moral hazard, which is when one party’s actions are hidden from another party. An example of a moral hazard problem is a driver who has auto insurance. Since the insurance company cannot monitor the driver’s behavior all of the time, the driver’s actions are hidden from the company. If the driver is fully insured, he has less of an incentive to drive carefully. Both types of information asymmetry affect the contracting problems faced by firms when procuring inputs. When purchasing an intermediate good, it is very likely that the input supplied will eventually know more about the cost of producing the input that the firm purchasing it will. That will give them an incentive to overstate costs in an attempt to extract a higher payment. When a board of directors hires a CEO, the will know much more about how hard they worked than will the board of directors. We consider each of these contracting situations in turn.

Contracting in PrincipalAgent Relationships with Adverse Selection Imagine a principal is contracting with an agent for a task, such as fixing a car. At the time of the contract, the principal does not know how hard or expensive (in time, money, or utility terms) the work will be. The agent may or may not know how difficult the task will be at the time the contract is signed, but he will find out before the task is undertaken. If the task is expensive, we will assume the agent can walk away without fulfilling the contract if doing so would bankrupt them. That is, the

332

contracts are enforceable only as long as the agent remains solvent – bankruptcy laws do not allow the principal to force the agent to take a loss. The principal wants to make sure that the agent does not lie and overstate cost in the event the job is easy, but also to make sure payment is high enough if costs are high to prevent the agent from taking a loss, because then the agent would walk away and the task would not get done. Because of this, the principal has to allow for two different payments within the contract: a payment if the work turns out to be hard, and a payment if the work turns out to be easy. The principal will never know by direct observation whether the work was hard or easy; he has to rely only on what the agent tells him. So, the principal is at an information disadvantage resulting from hidden characteristics about the nature of the agent’s work. Assuming the principal must get the job completed, such as the repair of his car, he must allow in the contract the possibility that the repair will be expensive, and thus the possibility that he will have to pay a high price. The problem with allowing for this possibility is that the agent now has the incentive to lie about the difficulty of the job in order to get the high payment. The solution to this problem is to arrange the contract in a way that incentivizes the agent to be honest about the true costs of his work. Even so, the information disadvantage will prove costly to the principal.

Model To develop a model for this contracting problem with adverse selection, assume we want to procure some inputs from an agent, such as bolts for some piece of machinery we are building. We don’t know ultimately how much it’s going to cost the agent to produce these inputs, but at the time of the contract we have some idea of the how likely it is that cost will be high or low. For the purposes of this model, we make the following definitions. q is the number of inputs the agent produces, and it can either be the amount that he produces if costs are high (qH) or the amount that he produces if costs are low (qL). Note qL > qH. V(q) is the value to our firm of having q units of the input; it is the profit that we can make in the future using these inputs, before subtracting the costs of input procurement. f is the probability the cost (to the agent) of making our inputs is low (CL(q)) 1‐f is the probability that the cost (to the agent) of making our inputs is high (CH(q)) P is the total (delivery) price called for in the contract, which is based on how many units the agent produces; it can either be the total delivery price if costs are high (PH) or the total delivery price if costs are low (PL). We assume that the principal has the bargaining power; in other words, the principal is creating this contract as a “take it or leave it” deal which is offered to several competing production firms. As described above, we also assume the agent

333

has the ability to file for bankruptcy. This implies that if we don’t pay the agent enough to cover his costs of production, he will file for bankruptcy and get out of the contract. We have two participation constraints that follow directly from this:

1. PH ≥ CH (qH )

2. PL ≥ CL (qL ) .

These constraints ensure that the contract prices cover the agent’s costs of production. We also have two incentive constraints that are used to make sure the agent is honest about his costs. The first one is that the agent would get at least as much (or a little more) profit from producing qH units if cost turns out to be high than he does from producing qL units, or

3. PH − CH (qH ) ≥ PL − CH (qL )

Similarly, we want the agent to get at least as much (or a little more) profit from producing qL units if cost turns out to be low than he does from producing qH units, or

4. PL − CL (qL ) ≥ PH − CL (qH )

Just like in menu pricing, we have two selection constraints and two incentive constraints; and just like in menu pricing, only one of each type bind. Looking at the incentive constraints, #3 induces the agent not to lie about costs being low when they are really high. Since a supplier would never come to a principal and say cost is only $5 when in reality it is $10, we don’t have to worry about constraint #3 binding. Looking at the two participation constraints, #2 says the price of the low‐cost contract must be greater than or equal to the cost of producing qL units. If this contract price were to ever be below the cost of production, the supplier could simply say that costs were high and produce the lesser quantity qH and still make a profit. So, we don’t have to worry about constraint #2 binding. Given this information, the model tells us that we want to maximize value less input procurement costs, which is

f (V (qL ) − PL ) + (1− f )(V (q H ) − PH ),

subject to the constraints

1. PH = CH (qH )

4. PL − CL (qL ) = PH − CL ( qH ) .

Rearranging constraint #4, and plugging in constraint #1, we get

PL = PH − CL (qH ) + CL ( qL )

PL = CH (qH ) − CL ( qH ) + CL (qL ) .

334

This last version of constraint #4 should make intuitive sense. It just says that if the agent says cost is low, we pay them their costs of production, CL (qL ) , plus the profit they would make if they lied and claimed cost was high, CH (qH ) − CL (qH ) . That covers their cost and keeps them honest. Substituting our constraints into the original problem we get

f (V (qL ) − cH (qH ) + CL (qH ) − CL (qL ) ) + (1 − f ) (V (qH ) − CH ( qH ) )

Now we can solve for qL and qH by maximizing this expression. Maximizing with respect to qL, we get

⎛ dV (qL ) dCL (qL ) ⎞ ∂ E ( π) = f⎜ − ⎟ = 0 ∂ qL dqL ⎠ ⎝ dqL

and dividing by f we get

MB (qL ) = MCL (qL )

which simply says marginal benefit at qL equals marginal cost. Maximizing with respect to qH, we get

∂E(π) = f (−MC H (q H ) + MC L (q H )) + (1− f )(MB(q H ) − MC H (q H )) = 0 . ∂q H

Rearranging this, we get

MB (qH ) − MCH (qH ) =

f ( MCH (qH ) − MCL (qH ) ) . 1− f

Looking at this equation, we see that marginal benefit equals marginal cost plus the extra term on the right. This is because the cost to us of increasing qH in our contract is the production cost plus the fact that the low‐cost producer becomes more tempted to lie and say cost is high. Thus, we request fewer units for qH than we would if we didn’t have to worry about the low‐cost producer lying. The following illustrates the adverse selection contracting problem in a graph. Since the firm is the principal and is paying the agent to procure a certain quantity of inputs, the marginal benefit curve (MB) is the firm’s marginal benefit from q units. The agent will either be a high cost producer (MCH) or a low cost producer (MCL). Then, qLe is the quantity that maximizes value added if cost is low, and qHe is the quantity that maximizes value added if cost is high.

MCH MCL MB(q) qHe

qLe

From the above discussion, we know when cost is low we want MB = MCL; so, the quantity in the low‐cost contract will be qLe. The quantity in the high‐cost contract,

335

however, won’t be the quantity that maximizes value added. This is due to the propensity the low‐cost producer has for lying and saying costs are high. Let’s look at another graph to illustrate this point. Let’s suppose for now the probabilities of high and low cost are equal. We know the area under the MC curve is total variable cost (ignoring any quasi fixed costs), and we know from the participation constraints that the contract price (P) has to cover the total cost. If we were to demand qHe units in the high‐cost contract, the price of the high contract would be as shown in the left figure below. Now, suppose the cost to the producer turns out to be low. If he lies and says that cost is high, he will get this high contract price; since his actual costs are based on the MCL curve, his profit will be as shown in the figure on the right. MCH

MCH MCL

MCL

PH MB(q)

MB(q) q

qHe

The only way to reduce the producer’s incentive to lie is to restrict qH; by doing this, we reduce the profit he makes from lying when costs are low; this means we gain some profit because of the cheaper cost of the high‐cost contract. But what happens to our profits if costs are actually high? Since we’d be demanding a quantity that is less than the quantity that maximizes value added (qHe), we are losing potential profits if costs turn out to be high. This is shown in the left panel of the figure below. So, every time we restrict qH, we lose some profit because the next unit has a higher marginal benefit than its marginal cost, but we gain some profit because the contract price we have to pay for the high‐cost producer is lower. We will then continue to restrict qH until these two marginal effects offset each other. This is shown in the right panel of the figure below. MCH

‐∆πH

MCL

∆πL

MCH

‐∆πH

MCL

∆πL

MB(q)

MB(q) qHe

336

qH qH*

This is precisely where our earlier solution for qH comes from; the distance –∆πH is MB(qH)‐MCH(qH), and the distance ∆πL is MCH(qH)‐MCL(qL). For us to find qH, these have to be equal. Note that this is only exactly true when f = 1f. As f increases, the probability of low cost increases, and as a result we have to restrict qH further as it becomes more likely the producer will lie about having a high cost. As f decreases, it is more likely that cost will be high, so the gain to reducing the incentive for lying when cost is low is smaller and, as a result, qH will be higher. Example Let V (q) = 10q − 1 4 q 2 , f = 0.5, CL = 0.5qL and CH = 2qH. The two binding constraints are

PH = 2q H and PL − 1 2 qL = PH − 1 2 q H .

Substituting the first into the second and solving for PL we obtain

PL = 3 q H + 1 qL 2 2

Expected profit function is

E(π) = f (V (qL ) − PL ) + (1− f )(V (q H ) − PH )

and substituting from the constraints gives:

(

)

(

)

2 2 E(π) = 1 2 10qL − 1 4 qL − 3 2 q H − 1 2 qL + 1 2 10q H − 1 4 q H − 2q H .

Maximizing this we find

dE (π) 1 = 10 − 0.5qL − 0.5 ) = 0 2( dqL qL = 19

and

dE (π) = 0.5 ( −1.5 ) + 0.5 (10 − 0.5qH − 2 ) = 0 dqH . qH = 13

If, on the other hand, we had perfect information, we would simply find our quantities by setting the marginal benefit equal to the marginal cost:

10 − 1 2 qL = 1 2 and 10 − 1 2 q H = 2 , so

qL = 19 and q H = 16 .

So, we can see that we reduce our contract for the high‐cost producer by 3 units because of the information asymmetry.

337

Up to this point, we’ve been discussing adverse selection contracts in scenarios where the producer is risk neutral but can file for bankruptcy. The reason this was important was that it required us to fully compensate the producer if cost turned out to be either high or low. Suppose, however, that the agent can afford to take the loss if cost turns out to be high. Then, if the agent signed a contract that will lead to a loss if cost is high, they can’t get out of it if cost really does turn out to be high, they just have to take the loss. If this is the case, the timing of the uncertainty of the contract and the attitude of the agent toward risk become important. If neither party knows what costs are going to be at the time of the formation of the contract, AND the agent is risk neutral, the principal can simply demand the quantity that maximizes value added and pay a contract price that compensates the agent for expected costs; that is, the price of the contract takes into account the likelihood of costs being either high or low. The participation constraint is simply:

f ( PL − CL (qL ) ) + (1 − f )( PH − CH (qH ) ) ≥ 0 .

Then, if the agent takes the contract, he will be forced to produce the requested quantity; if cost turns out to be low, he will make a profit, and if costs turn out to be high, he will take a loss. So, if the agent can afford to take a loss on a high‐cost contract, we don’t face the “bankruptcy constraint” that we dealt with in the first method. In that case, we get the efficient quantity for either high or low cost because, at the time of signing the contract, and since there is no way out of the contract once it is signed, there is no information advantage to the agent, and, no disadvantage to the principal. Now suppose that the agent we’re contracting with is not a firm that produces inputs, but rather is an individual. When we were dealing with a firm, we just had to worry about covering their costs of production. This is because firms are risk‐ neutral. We know, however, that individuals are risk‐averse. So, if we want an individual to participate in our contract, we need to look at their expected utility, as opposed to just expected costs. If f is the probability that the job is easy and UR is the individual’s reservation utility, the participation constraint then becomes something like:

fu ( PL − CL (qL ) ) + (1 − f )u ( PH − CH (qH ) ) ≥ uR .

If we’re dealing with an individual, however, introducing uncertainty about the outcome introduces risk into the individual’s compensation. Since the individual is risk averse, they will demand higher compensation on average to make up for the risk. So, there is a tradeoff – making the high cost option less attractive increases risk and therefore increases expected compensation costs, but, it also decreases the incentive to lie, reducing expected compensation costs. The optimal contract will strike a balance between these two opposing forces.

338

Contracting in PrincipalAgent Relationships with Moral Hazard Now we’re going to focus on contracting issues between principals and agents in the context of a moral hazard problem – when the agent knows more about their actions than does the principal. Suppose a board of directors is hiring a new CEO for a company; then, the board of directors is the principal of the relationship, and the CEO is the agent of the relationship. The issue of moral hazard arises immediately because the principal is unable to actively observe all the work that the agent does. As a result of this, even if the principal feels the agent didn’t work hard at the task, it would be hard to tangibly prove it in a court. This leads us into a discussion about what the principal can do to motivate the agent to work diligently at their tasks. The natural solution to this difficulty is for the board of directors (the principal) simply to pay the CEO (the agent) based on company performance. That is, the board is hiring the CEO to make the firm profitable; so, pay him if the firm ends up making a profit, and don’t pay him if the firm doesn’t make a profit. This is a suitable solution in a world with no uncertainty. If we could be certain the firm would be profitable if the CEO worked hard, and unprofitable otherwise, then we could pay him accordingly. But what if there were some way the firm could be profitable if the CEO did not work hard, and, unprofitable even if he does work hard? We wouldn’t want to punish this industrious CEO simply because of something that wasn’t under his control, or, overly reward those who just get lucky. Because these types of situations occur in the real world, we have to find a way to reconcile this uncertainty. The only way the board will hire the CEO is if they pay him in a way that is contingent on company profits, in order to incentivize him to work hard. From the CEO’s perspective, however, the contract has become risky; that is, he knows there are certain elements that influence company profits that aren’t under his control. We know individuals are risk averse. Since his salary is now tied to company profits, and company profits are contingent on uncertain things that he has no control over, the company will have to offer more money for the CEO to compensate for this added risk. The principal thus faces a trade‐off between risk and incentives when designing a contract for the agent. To make it more likely that the CEO will work hard, the principal has to add greater incentives for working hard. Incentivizing the CEO to work hard leads to higher profits for the company. In the face of these high incentives, though, the contract has become more risky; the CEO will demand a higher salary because of this, which leads to lower profits for the company. The optimum contract balances these two effects. In addition to profit, there are other indicators to use as signals of the effort of the agent, such as costs, sales, complaints, reports of monitors, and the performance of other firms (yardstick competition). The idea is to tie the pay of the agent to signals that best represent the job he is being hired for. So, if the agent is a CEO, overall company profit is a good signal. If the agent is a production manager, tying his contract to the firm’s production costs is a better way to incentivize him to work hard at what he’s supposed to do – it is less random and more under his influence.

339

Model For the purposes of this model, we make the following definitions. V is the value of the firm, which can either be high (VH) or low (VL) e is the effort of the agent, which can either be high (eH) or low (eL) f is the probability that the value of the firm is high given the agent works hard, or Pr(VH|eH) g is the probability that the value of the firm is high given the agent doesn’t work hard, or Pr(VH|eL) u(w) – c(e) is the agent’s utility function, which is the utility he gets from the wealth of the contract, u, minus the utility cost, or disutility, from putting forth effort. For simplicity, assume the cost of high effort is c (that is, c(eH) = c) and assume the cost of low effort is 0 (that is, c(eL) = 0). uR is the reservation utility, which is the utility of the agent’s next‐best option. If you don’t meet this level of utility with your contract, the agent won’t take it. wR is the reservation wage, which is just the single amount of wealth that corresponds to the reservation utility for a job with low effort required; that is, wR = u(wR). Since we have limited the possibilities to two (high value or low value), this is the only signal on which to base a contract. Let wH represent the wage paid to the agent if value is high, and wL represent the wage if value is low. Note that we can also think of these two wages in terms of a base pay and a bonus; that is, wL = wBASE and wH = wBASE + bonus. We will assume that the principal wants the agent to work hard. Given this, the principal wants to minimize expected compensation costs. The expected cost of hiring the agent is the probability of high value times the high wage, plus the probability of low value times the low wage, assuming the agent works hard, or fwH + (1 − f )wL .

This minimization problem is subject to two constraints. The first constraint is the participation constraint. This says that you must pay the agent enough to actually induce him sign the contract, which means his expected utility from the contract must be greater than or equal to his reservation utility, or

E (u | eH ) = f (u ( wH ) − c) + (1 − f )(u ( wL ) − c) ≥ uR .

We use f here because we are only concerned with getting the agent to participate when he works hard; we aren’t trying to get him to participate when he doesn’t work hard. If we expand the above equation, we get

E (u | eH ) = fu ( wH ) + (1 − f )u ( wL ) − c ≥ uR .

340

Since the opportunity cost of the agent signing our contract is uR, this constraint binds; that is, if we don’t meet this constraint, the agent walks away. We want to make sure that his expected utility from the contract is as much as his reservation utility – but making it greater just takes profits away from the firm. So, equality holds for this constraint. The second constraint is the incentive constraint. This constraint says the agent’s expected utility from working hard must be greater than or equal to his expected utility from shirking, or

E(U | e H ) ≥ E (U | eL )

We’ve defined the expected utility from working hard above; the expected utility from shirking is the chance value is high given that he doesn’t work hard times the high wage, plus the chance value is low given that he doesn’t work hard times the low wage, or

E (U | eL ) = gu ( wH ) + (1 − g )u ( wL ) − 0 .

Notice the cost of effort here is 0, since the cost of shirking (using our notation) is 0. Then, the incentive constraint becomes

fu ( wH ) + (1 − f )u ( wL ) − c ≥ gu ( wH ) + (1 − g )u ( wL ) .

This constraint also binds, meaning it holds with equality. If we don’t satisfy it, the agent would have no incentive to work hard. If the agent works hard, they incur an extra cost of c and get a higher probability of wH since f > g. So, the only way to compensate for the cost of effort c is to increase the bonus given for success by increasing wH or decreasing wL. Recall from our earlier discussion that increasing the bonus introduces risk into the contract, which increase the average salary that we have to pay the agent; so, we want to increase the “bonus” to the point where he gets just as much utility from working hard as he does from shirking, but not any higher, as this will just increase the firm’s contracting costs. So, the constraint will be met with equality. To solve this problem, let’s start out by working from the incentive constraint. Rearranging the constraint for the cost of effort we get

( f − g )u ( wH ) + (1 − f − 1 + g )u ( wL ) = c or

( f − g ) ( u ( wH ) − u ( wL ) ) = c .

This form of the incentive constraint has an intuitive explanation. The right‐hand side is just the disutility from working hard. The left‐hand side is the extra probability of high value from working hard, times the extra utility from the high wage. So, the left‐hand side is the expected benefit from working hard, and the right‐ hand side is the expected cost of working hard; as long as the agent “covers” the cost of high effort by the expected benefit he gets from putting forth high effort, he’ll be incentivized to work hard.

341

Finally, we will rewrite the equation once more in order to make our calculations easier:

u ( wH ) − u ( wL ) =

c f −g

Now let’s rearrange the participation constraint:

u ( wL ) + f ( u ( wH ) − u ( wL ) ) − c = u R

In this form, the participation constraint intuitively says the agent’s utility from the base pay, plus his expected utility from the bonus, minus his utility from working hard should equal his reservation utility. Substituting the final form of our incentive constraint into our participation constraint and rearranging, we get

f c − c = uR f −g

u ( wL ) +

u ( wL ) = uR + c −

f c f −g

u ( wL ) = u ( wR ) −

g c . f −g

This says that the utility of the base wage equals the utility of the reservation wage minus a multiple of the cost of effort. The more likely it is that the agent gets the bonus for doing nothing, the lower the base wage is, in order to discourage the agent from doing nothing. Similarly, the higher that the probability the agent gets the high wage from working hard, the higher you make the base wage; this is because the more likely it is that hard work leads to high pay, the less you have to discourage the agent from doing nothing. We now need an expression for the high wage. We can obtain this by plugging in the expression for the low wage into the incentive constraint. This gives:

u ( wH ) = u ( wL ) +

c f −g

u ( wH ) = u ( wR ) −

g c c+ f −g f −g

u ( wH ) = u ( wR ) +

1− g c f −g

This says the high wage must cover the reservation wage plus a multiple of the cost of effort. Looking at the multiple, we see the higher f is, the lower we can make our high wage. This is because the more likely it is that the agent gets the bonus from working hard, the less high we have to worry about incentivizing him to work hard, and thus we can reduce the riskiness of the contract by reducing the high wage.

342

Conversely, as g increases, the higher we must make the high wage. This is because g is essentially the probability that the agent gets lucky for shirking, and the higher this probability becomes, the more we have to incentivize the agent to work hard by raising the bonus. Let’s summarize the implications of changes in the exogenous variables of this model on the low and high wages we have to pay our agent in the following table. wH

‐

Exogenous variable

Now, let’s compare this scenario with one in which the agent completely determines the value of the firm, and the firm has complete information about the agent. In this case, f = 1 and g = 0. As f approaches 1, observe what happens to the high wage. Looking at the utility of the high wage

U(w H ) = U(w R ) +

1− g c f −g

and plugging in f = 1 we get

1− g c 1− g

U(w H ) = U(w R ) +

U(w H ) = U(w R ) + c

which says that we just have to set the high wage high enough to compensate for the reservation wage, plus the cost of working hard. As g approaches 0, and looking at the utility of the low wage

U(w L ) = U(w R ) −

g c f −g

when we plug in g = 0 we get

U(w L ) = U(w R ) −

U(w L ) = U(w R ) .

0 c f −0

This says base pay is just equal to the reservation wage. So, with perfect information, we’d pay a hard‐working CEO his reservation wage plus the cost of working hard, and we’d simply pay a CEO that shirks his reservation wage. The extent to which our actual numbers differ represents the cost of

343

uncertainty due to moral hazard; these can be attributed to the fact that the signals we are using are imperfect, or, noisy. Example Let f = .8, g = .4, u(w) = w0.5, uR = 10, and c = 8. The participation constraint is

expected wage that we will pay our agent is

0.8 wH + 0.2 wL − 8 = 10

If we were operating under complete certainty, we would set their wage where their utility from working hard is equal to their reservation wage, or

and the incentive constraint is 0.8 wH + 0.2 wL − 8 = 0.4 wH + 0.6 wL

Solving these two equations, we get

wL = 4

E ( w) = 0.8(484) + 0.2(4) = 388

wCERTAINTY − 8 = 10 wCERTAINTY = 324

Thus, the cost due to asymmetric information is

and w H = 484

Since we’ve satisfied both constraints (indeed, that’s how we found our wages) we know for sure that our agent will work hard; then, the

E(w) − wCERTAINTY = 388 − 324 = 64 .

Pitfalls of Incentive Contracts The first thing that can go wrong with incentive contracts is when a principal acts in the best interest of the agent, as opposed to the company. For example, if a board of directors were hiring a new CEO, but there were some ties between the board members and the CEO, clauses built into the contract may actually just be golden parachutes that serve no purpose in actually incentivizing the CEO to work hard. We won’t elaborate any more on this issue because in general we are assuming that the management of the company will act in the shareholder’s best interests. Given that the principal is acting on behalf of the company, commitment becomes the next major issue. We’ve designed an incentive contract to ensure that the CEO behaves a certain way; it is crucial that we act exactly as we’ve specified in the contract in order for it to work. The problem is that we know we’re only going to pay the CEO if the company is profitable; but even if the CEO works hard (which we know he will, since we’ve designed the contract that way) there’s a chance the company will not be profitable, and thus we will have to pay him the low wage. Even if we know he’s worked hard, and the company fails due to something out of his control, we cannot pay him any sort of “sympathy” pay, as this would cause his incentives to be awry from the beginning. So, even though it may seem brutal, both parties must commit to everything in the incentive contract for it to work properly.

344

Another issue arises when parties start to renegotiate their contracts. If management has an incentive contract with a new CEO, and after the CEO’s first year he does well, it’s possible the management will rewrite the contract at the beginning of the next year with “higher” incentives to induce the CEO to improve upon last year. The problem with this is that the CEO may realize this, and as a result he would have motivation to not improve as much as he could in order to leave himself room for improvement over the long‐term. This is known as the ratchet effect. Finally, since we are talking about powerful incentive contracts that are designed to overcome massive information problems, it is good to be aware of the law of unintended consequences. For instance, in our earlier example we found the high‐effort wage to be 484; if this were in millions of dollars, and this bonus wage were to be used as a way to get management to work hard, you have just given the accountants an incentive to falsify their balance sheets. So, it’s important to make sure your contracts measure things that actually add value to the company in order to prevent the agent from finding an easier way of earning the bonus.

345

Chapter 20 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Asymmetric Information A state in which one party knows more than others. Moral Hazard A case of information asymmetry in which one party’s actions are hidden from another party. Incentive Contract Contract that works to decrease information asymmetry by allowing the contractor to legally bind an agent to work hard instead of shirking. Adverse Selection A case of information asymmetry in which one party’s characteristics are hidden from another party. Procurement Contract Contract that works to decrease information asymmetry by allowing the contractor to legally bind the agent to tell the truth about the costs of production. Participation Constraint Constraint that must hold true in order for a party to participate. In the case of contracting with moral hazard, the contractor must pay the agent enough to actually make him sign the contract; the agent’s expected utility from the contract must be at least as much as his reservation utility. In the case of contracting with adverse selection, the contractor must pay the agent enough to cover costs for the agent to sign the contract; the price the contractor pays the agent if costs end up being high must be at least as much as the high cost of the job and the price the contractor pays the agent if costs end up being low must be at least as much as the low cost of the job. Reservation Utility The utility that can be received at an agent’s next best job. Incentive/Selection Constraint Constraint that must hold true in order for a party to act a certain way or buy a certain membership or bundle. In the case of moral hazard and contracting, the contractor will want to pay the agent enough to work hard instead of shirking; the agent’s expected utility from working hard must be at least as much as his expected utility from shirking. In the case of contracting with adverse selection, the contractor must pay the agent enough to be honest about the cost of the job; the agent must get at least as much profit from producing the high‐ cost number of units when costs are high as he does from producing the low‐cost number of units. The agent must also get at least as much profit from producing the low‐cost number of units if costs are low as he does from producing the high‐cost number units.

346

Chapter 21 The Firm In focusing on maximizing profit, and therefore shareholder value, we simply assumed firms exist and that they are owned by shareholders. But, why do firms exist in the first place? As argued long ago by Adam Smith in “The Wealth of Nations,” division of labor and the resulting specialization boost productivity. Imagine that we are dealing with some long process for producing a final good valued by an end user. The good begins as raw materials and undergoes transformations at many intermediate steps until the final product is ready for the end consumer. We can imagine an economy in which everyone specializes in whatever they are comparatively best at. Each individual would purchase “inputs” from the individuals who best perform the previous intermediate step, perform their own part of the task, and then sell the resulting “output” to the individuals who perform the next intermediate step, and so on, until the final product is produced. This hypothetical economy would function through the interaction of very specialized individuals, or proprietors, with no role for firms. In this hypothetical completely decentralized economy, if the value of a good increases or decreases, so does the good’s price through the interaction of supply and demand. These price signals induce resources to flow towards their highest valued uses. In contrast, many distinct operations occur within a single firm, which operates under a single centralized ultimate decision making authority. The same firm can buy raw inputs, turn them into numerous intermediate goods, assemble them to produce a final good through many related steps, and employ accountants, marketers, and even janitors to facilitate the firm’s activities. This is the opposite of specialization in some sense, since a single decision making entity is performing a multitude of functions, some of which are only very loosely related to others. Within the firm, there are no price signals since everything is directly under control of the management; thus, resources won’t be automatically allocated to the areas of the highest value. Rather, the firm relies on the centralized authority to formulate the plan on which the firm operates. Why not rely on the market for those decisions and have more specialization? Why have firms at all? To see the answer, we take a step back. Every stage of the production process requires inputs. Those inputs must be procured in some way. There are three main ways that an entity can obtain inputs. The first is to simply buy them in the spot market. The spot market is just the free‐market economy in its purest sense; you go to the market to determine the market price of whatever input you need, and buy as many of them as you require. If all inputs could be readily and efficiently obtained in the spot market, firms would be superfluous. As covered in the last chapter, the spot market functions well as long as there is near perfect information, products are very standardized, and transactions costs are low. If all of these related conditions are met, supply and demand will be able to accurately represent the cost and value of a good, and the spot market will be able to

347

facilitate trade efficiently. But, the spot market tends to break down if any of these three conditions aren’t met. The more differentiated an input is, or the higher the transactions costs of finding it, the thinner the market will be. These are related; a lack of standardization means a thin market in which much depends on search and negotiation, which means high transactions costs. Similarly, when information is incomplete, it may take a great deal of time just to gather enough information on which to base a decision, and when information is too asymmetric, every transaction can become a time consuming negotiation. If the spot market is inadequate, the next possibility is to negotiate a contract to procure the input. That was considered in detail in the last chapter. If contracting is too costly, the only remaining alternative is to make the input yourself – which is exactly what a firm does. A firm that vertically integrates several stages of the production process is internally creating some of the intermediate inputs needed for its final product. So, the firm may be viewed as a way to improve efficiency by internalizing spot market failures and centralizing decision making authority related to that failure when it is not cost effective to overcome it via contracting. We will now look more closely at three specific types of input market failures that can occur when inputs are not standard, information is imperfect, transactions costs are high, or input markets are not competitive. These are team production, relationship‐specific investment, and double marginalization.

Team Production and Free Riding Team production refers to a situation when multiple aspects of a production process are for some reason inseparable. An example is the design of a new car; this is a very specialized procedure that requires several different groups of people with specific knowledge about different aspects of automobiles to come together and agree on the designing, working in a way that is inherently interactive and synergistic. Suppose a group of ten individuals gets together and begins work on this new car, and imagine they plan on evenly splitting whatever profit the group ends up with evenly among all members. Now, suppose one of the team members could cancel weekend plans with his family and thereby add $4,000 of value to the project. If the work is not done that particular weekend, the opportunity will be missed because the project will be at too late a stage to implement the change. Since ultimately he will only get one tenth of the profit the team makes, his profit will increase by $400. Suppose he would only be willing to cancel his weekend plans if he made at least $500. Then, he will not make the change, and $3,500 of value added will be lost (the 4,000 increase less the opportunity cost of the canceled plans). This is the nature of the free riding problem in team production situations. It is a result of individuals bearing all the costs of their individual efforts but only receiving a share of the benefits. This leads to individual team members putting in less effort than optimal and instead “free riding” on the efforts of other group members.

348

Model Suppose players A and B are working together in a team production situation. They split all proceeds evenly. For the purposes of this model, we make the following definitions. e: The amount of effort each individual puts forth. Player A’s effort is eA and player B’s is eB. Effort measures both the duration and intensity of work. V: The value generated by the project. It is a function of the total effort put in to the project by all team members. In this case, the value of the project is V(eA, eB). C: The cost of effort, measured in terms of the amount of monetary compensation the individuals would require to voluntarily put forth a specified amount of effort. Player A’s cost of effort is CA(eA) and player B’s is CB(eB). S: The surplus generated by the project, or gross value less the cost of effort. Suppose both players work together to create as much joint, or total, surplus as possible. In this case, the total surplus of the project is

STOT = V (eA , eB ) − C A (eA ) − CB (eB ) .

Maximizing total surplus with respect to player A’s effort, we get

∂STOT = MBA − MC A = 0 ∂e A

where MBA is the marginal benefit of player A’s effort, and MCA is the marginal cost of player A’s effort. For player B, we get the same thing:

∂STOT = MBB − MC B = 0 . ∂e B

MCA

If everyone on the team is working efficiently, they should all be working where the marginal benefit of their effort is equal to the marginal cost of their effort. Let this level of effort be called e*. The figure at right shows A’s marginal cost and marginal benefit of effort assuming B puts forth effort eB*. The “efficient” joint effort level for A is eA*. The same would be true for B.

MBA eA*

Let’s now consider the Nash equilibrium if the players play this game uncooperatively, that is maximizing their own surplus, not the team’s total surplus. Player A’s surplus is

SA =

V (eA , eB ) − C A (eA ) , 2

349

since he only gets half of the value of the project, and bears all of the costs of his own effort. Maximizing we get

dS A MBA = − MC A = 0 . deA 2

MCA

The figure to the right shows the marginal benefit of A’s effort, assuming player B puts in the jointly efficient effort level, eB*. Since there are two people in the MBA group, the marginal benefit that player A actually receives is MBA/2, the rest goes to eA eˆ A eA* player B. If player A acts in his own self‐ interest, he will put forth effort eˆA ; this level of effort maximizes player A’s surplus by setting their marginal cost equal to the portion of their marginal benefit that they receive. For any given level of effort put forth by player B, player A will want to work less than the effort level that maximizes total surplus. Player B will act in the same way. Thus, each player has an incentive to “slack,” that is, to work less than the amount that would be best for the team as a whole. At the Nash equilibrium, both players are putting in less than the efficient level, but, the level that maximizes their own surplus given the other player’s effort. The value of the project depends on both the level of effort put forth by player A as well as player B, so MBA depends on both eA and eB. Therefore, we can solve the equation above for A’s choice of effort as a function of B’s effort level, that is, A’s reaction function:

eA = RA (eB ) .

If you were to do the same thing for player B, you’d obtain a similar equation:

eB = RB (eA )

As we have seen many times, the Nash equilibrium occurs where the reaction functions intersect. This is shown in the figure to the right.

eB eBNE

RA RB

eA eANE To summarize, since players on the team receive only a fraction of the benefit of the work they put in, everyone will free ride, putting in less effort than is in the team’s best interest. As a result, total effort and thus total surplus will be below the efficient level. That means that each player’s individual surplus when they play in a narrowly self interested way is LESS than it would be if each instead played for the good of the team. Thus, free riding in team production situations is very similar to testifying in the prisoner’s dilemma. As the number of members on the team grows, this problem gets worse, since each player receives only 1/n of their marginal benefit. The shares

350

need not be equal, but, if one player receives more than 1/n, that means another must receive less than 1/n. Example

V = 8eA eB , 2 2 1 c(eA ) = 4 eA , and c(eB ) = 1 4 eB .

Suppose

0.5

STOT = 64 − 1 (8) 2 − 1 (8) 2 = 32 . 4 4

Total surplus is

Since each player gets half the value and incurs the whole cost of their effort, A’s individual surplus is:

STOT = 8eA eB 0.5

0.5

2 2 − 1 4 eA − 1 4 eB

Maximizing with respect to player A we get

Maximizing with respect to player B we get

Now suppose the two players do not cooperate; instead each maximizes their own individual surplus. Player A’s surplus is

⎛e ⎞ 1 ∂STOT = 4⎜ A ⎟ − e B = 0 2 ∂e B ⎝ eB ⎠ 0.5

This problem is symmetric, so eA = eB, in the solution to the two equations above. Using that simplifies the solution. Substituting into the first partial derivative above gives:

⎛e ⎞ 4⎜ A ⎟ ⎝ eA ⎠

0.5

SA =

8eA 0.5eB 0.5 1 2 − eA , 2 4

and maximizing gives

1 − eA = 0 2

⎛e ⎞ dS A = 2⎜ B ⎟ deA ⎝ eA ⎠

0.5

1 − eA = 0 . 2

Since the game is symmetric, eA=eB in the Nash equilibrium. We can solve for the equilibrium using that fact and substituting into the first order condition above. We could also proceed to find A’s reaction function.

0.5 4(1) − 1 2 eA = 0 4 = 1 eA 2 eA = 8 = eB

At the efficient joint solution, both A and B put forth 8 units of effort, the gross value of the project is V = 8(8)0.5 (8)0.5 = 64 and total surplus is

64 1 2 − (8) = 16 . 2 4

B’s surplus is the same in this instance, since the problem is symmetric. This is the solution if both players cooperate and put forth the efficient level of effort.

⎛e ⎞ 1 ∂STOT = 4⎜ B ⎟ − e A = 0 2 ∂e A ⎝ eA ⎠ 0.5

SA =

351

⎛ eB ⎞ ⎜ ⎟ ⎝ eA ⎠

0.5

1 eA 4

Value in this instance is

eB ⎛ 1 ⎞ = ⎜ eA ⎟ eA ⎝ 4 ⎠ 1 eB = e3A 16 eA = 161 3 e1B 3

1 2 = eA 2

0.5

V = 8 ⋅ 40.5 ⋅ 40.5 = 32 ,

player A’s surplus is then

SA =

32 1 2 − (4) = 12 . 2 4

Player B’s surplus is the same, so total surplus is

Player B’s reaction function is found the same way. We now use symmetry to find the equilibrium effort levels from the first order condition above. ⎛e ⎞ 2⎜ A ⎟ ⎝ eA ⎠

eA = eB = 4

STOT = 24 .

By not working together and instead maximizing their own individual surplus, they end up worse off individually and in total.

1 − eA = 0 2

We next consider ways to overcome, or at least reduce, free riding in team situations. One option is to work only with players who have strong reputations for honesty and cooperation. Another, related, option is to play repeatedly with the same partners, employing trigger strategies to obtain cooperative behavior. A third possibility is to try to overcome free riding through contracting. If effort is observable, the contract would specify that a player would not get paid unless they put forth the jointly efficient effort – total value would be split among only the players that put in the efficient effort. But effort is generally not measurable and observable, only value is. In that case, the contract could only be contingent on the total value produced. Imagine a contract that specifies a base pay level for everyone and a bonus which is paid only if total value equals or exceeds what is expected if everyone puts in the efficient effort. The strongest form of such a contract would be to pay everyone 0 unless value meets the target, in which case the value is divided among the team members. What if just one player free‐rides? If the contract is enforced, no one in the group gets paid the bonus. So, no one has any incentive to enforce the contract, they would rather be paid a share of the bonus than none of it. Since everyone knows that no one has any incentive to enforce the contract, such a contract cannot prevent free riding. The difficulties overcoming free riding through contacting are even worse when effort is not only unobservable, but in addition value is partially determined by random influences outside the team’s control. The final possibility for overcoming free riding is to form a firm where the shareholders are different individuals than the members of the production team.

352

Shareholders are “residual claimants”; that is, they have a claim to any residual profits the firm has after everyone is paid. In the above contracting example, if one person slacked off, nobody had any incentive to enforce the contract, which would deny them all their bonuses. However, shareholders would indeed have the incentive to enforce such a contract with their workers, because they get to keep the value of the bonuses if gross value falls short of the target level. This is a very important reason for the existence of firms – to create a body of shareholders that has the incentive to enforce incentive contracts with members of the production team. This solution is not perfect. Imperfect information can get in the way of effectively monitoring the team. In that case, writing contracts on observed value places risk in the agent’s contracts, increasing what they must be paid. It can be costly to negotiate a contract that everyone is actually willing to sign. Having a CEO that the production team members, the shareholders, and the shareholder’s board of directors believe in can be a tremendous advantage in this situation. Suppose the CEO can observe effort reasonably closely, but, effort cannot be proven in court. Then, the production team’s contracts can be simple, calling for a given base pay plus a bonus IF the CEO gives them a strong performance evaluation. As long as everyone believes that the CEO will do a good job of evaluating effort and will do so honestly, the production team members will be willing to sign such simple contracts. As long as the shareholders also believe in the CEO’s ability and honesty, they are happy to have that sort of contract with their production team, and, to enforce the contract in court if needed. Suppose a contract is formed that has a base pay of $50,000, and a bonus pay of $50,000 if the group works hard. The explicit part of this contract is the amount of money specified; but the contract also has an implicit part that says it is up to the CEO (or whoever the monitor is) to decide how hard the group members worked. Thus, having a trustworthy CEO who is skilled in evaluating performance is critical in getting the group members to sign the contract while keeping the cost of negotiating and enforcing such contracts low.

Relationship Specific Investments and the Hold Up Problem A relationship specific investment is an investment made solely, or at least largely, to facilitate a certain market transaction. It therefore loses all (or at least much) of its value if the transaction does not occur. Consider an electric company who builds a plant near a coal mine owned by another company and to lays railroad track to get the coal from the mine to the plant. If the two firms are unable to then negotiate agreeable terms for coal prices, the investments embodied in the plant and the track are worthless. Thus, those investments are specific to the relationship between the electric company and the firm, and, are not valuable outside of that relationship. There are three types of relationship specificity. The first is location specificity, as in the example above. Due to where the asset must be located and the inability to move it from one location to another, its value depends on a specific relationship. The second is physical asset specificity. In this case, an asset is produced in a very specialized way that is not valuable outside of the transaction it was designed to

353

facilitate. For example, if your firm invests in highly specialized machinery that can only be used to build parts for a particular jet, this machinery is only useful as long as you have a contract with the government to work on that jet; if this contract is rescinded, the machines becomes useless. The final type is human capital specificity. Many transactions between firms require managers and other workers of both firms to get to know the other company’s finances, management structure, production processes, etc… If the relationship between the firms falls apart, all of this human capital knowledge becomes useless. How is it that the specific nature of these investments can lead to market failure and thus an additional role for firms? Their very specificity means no spot market exists for them – everything depends on bargaining and negotiation, meaning there are significant transactions costs and correspondingly there is not large group of suppliers and consumers determining market prices through their interactions. Bargaining and negotiation in this circumstance poses a glaring problem. Once the party that must undertake the relationship specific investment has actually sunk the investment, the other party KNOWS the investment has little or no value outside the transaction. As a result, the other party has them “over a barrel,” so to speak, and will attempt to renegotiate the terms of the agreement, to extract more surplus for themselves. That is, if a party to a transaction sinks a lot into a relationship specific investment, they open themselves up to being “held‐up” by the other party down the road. For example, suppose Michael wants to procure a measuring device that hasn’t been designed yet and is very specific to his particular needs. He approaches Susie and proposes an agreement whereby Susie invests time and money to create and design these devices, which Michael will then buy from her. Suppose Susie must invest $2,000 in time and money to create these devices, but after the initial investment the cost of producing a single device is only $10. Michael agrees to pay $40 per unit in order to compensate Susie for the initial investment; so Susie agrees. But what if, after Susie has designed the devices and begun to manufacture them, Michael approaches her and only offers to pay $11 per unit. It is too late for Susie to recoup her losses on the initial investment, and since the devices are specific to Michael’s venture, she cannot sell them to anyone else. So, she will have to agree to a renegotiated price. Exactly how high that price will be depends on the bargaining power of the two parties. The important point is the specific nature of the investment makes it unrecoverable outside of this transaction, which means Susie is vulnerable to opportunistic behavior on the part of Michael. To protect herself from this, Susie will tend to under‐invest in product development, since she will have less to lose. As a result, both parties will be worse off. With relationship specificity, unless there is an explicit and readily enforceable contract in place, the party making the specific investment can be “held up” and pressured for renegotiation by the other party. Because the party that is making the investment can predict this type of opportunistic behavior, they will tend to sink less into the initial investment. This reduces value added, and potentially makes both parties worse off compared to the jointly efficient solution. Theoretically, both

354

parties may need to sink resources in relationship specific investment, and, they can each then try to exploit the other, meaning both parties may under‐invest in the relationship.

Model To model this situation, we assume that there is a single transaction between a buyer and a seller, each of whom must make specific investments to facilitate the transaction. We further assume that any negotiation or renegotiation will result in an even split of the surplus which is at stake in the negotiation. We make that assumption to keep things simple. The actual split will depend on the bargaining power of the parties. But, different splits will not change the basic result, nor would more complex bargaining models. We also define the following notation. I: The amount of the relationship specific investment; the seller’s investment is IS and the buyer’s investment is IB. V(IS,IB): The value generated by the transaction, before subtracting the costs of the investment, as a function of the investments made by the seller and the buyer. Total surplus, or value added, is the value generated by the transaction minus the costs of the investment. In this case, it is STOT = V(I S ,I B ) − I S − I B .

If both parties cooperate, they will want to maximize total surplus. Maximizing with respect to the seller’s investment we obtain

∂STOT ∂V = −1 = 0 . ∂I S ∂IS

This just says that the seller should continue to invest until the last dollar he invests generates exactly one dollar of value added. Maximizing with respect to the buyer’s investment gives a similar result:

∂STOT ∂V = −1 = 0 . ∂I B ∂IB

Working together, both buyer and seller should invest until the marginal benefit of investment equals the marginal cost of investment. We denote those jointly efficient estimate levels IS* and IB*. What if both parties anticipate renegotiation after the investment is sunk (the hold‐up problem)? Then each party would want to maximize their own individual surplus, as opposed to maximizing total surplus. Remember, we have assumed that any later renegotiation will result in each party keeping half of what is at stake in the negotiation. Then, the buyer’s individual surplus is the share of the value he would take away from the renegotiation less his investment, or

355

SB = 1 V (IS ,IB ) − IB . 2

Maximizing with respect to the buyer’s investment gives:

∂SB 1 = MBB −1 = 0 . ∂IB 2

This means the buyer will only invest up to the point where half the marginal benefit of his investment equals his marginal cost; in other words, he will under‐ invest. The same is true for the supplier. In the figure to the right, we show the marginal benefit of the buyer’s investment assuming the seller makes the jointly efficient level of effort. We can see that the level of investment put in if the buyer maximizes his own surplus ( IˆB ) is less than the level of investment put in if the buyer maximizes total surplus (IB*).

1 MBB ½MBB

IˆB

IB*

Assuming both the buyer and seller maximize their own individual surplus, IS RB solving the first order condition above will give the buyer’s investment as a function RS NE of the seller’s choice (the buyer’s reaction IS function). The seller’s reaction function is found the same way. The figure to the right depicts the Nash equilibrium investment levels. If each party anticipates renegotiation and, as a result, acts in their own self‐interest, they will both put forth IBNE IB less than the jointly efficient effort level to protect themselves from opportunistic behavior by the other player AND to try to take advantage of the other player themselves. What can be done to overcome the hold‐up problem? One way is for the buyer to have two sources to buy from (known as dual sourcing) and for the seller to have two buyers (known as dual buyers). Having multiple buyers/sellers can limit the threat of opportunistic behavior and increase your bargaining power, but it can also increase your costs by duplicating costs. Another thing that can be done is to use contracts. If you write a secure contract, it is possible to eliminate renegotiation. Effective contracts can also reduce costs of future negotiation by standardizing the process. There are some potential disadvantages, however. Contracting can be costly, and many times will require an entire legal department for it to be handled correctly. Also, being tied down to a rigid contract means that you have no flexibility to change any of the terms. If

356

further research reveals that you no longer have any need for the devices that you originally contracted for, you may wish to change the terms for reasons that have nothing to do with opportunistic behavior; but, the contract has to be airtight to properly solve the hold‐up problem, so you’re stuck. The question then becomes balancing the marginal benefit of a longer contract with the marginal cost of a longer contract. If avoiding renegotiation were critical, a longer contract would be preferred; but if you had to have multiple contingencies built into the contract for every possible future outcome, the cost of writing the contract increases with its length and thus a shorter contract would be preferred. If neither of these methods solves the hold‐up problem, the only other alternative is to vertically integrate and produce the inputs yourself. This creates an upstream firm that takes raw materials and converts them into parts, which the downstream firm then buys and converts into the final product. One problem that results from this is that the new firm loses the specialization it had. Since the firm now has several areas of production, management is required to process much more information in order to efficiently run the firm. Another issue that may emerge is an internal form of the hold‐up problem; that is, if there is relationship specific investment going on between the upstream division and the downstream division, the divisions now has an incentive to try to take advantage of one another. However, there is no worry about enforcing contracts between the divisions in court; the CEO assumes this role. So, assuming you have a trustworthy and knowledgeable CEO who can make good decisions with the entire firm’s interest in mind, it is enough for the CEO to directly manage the divisions’ actions and hold both divisions to the firm’s internal agreements.

Double Marginalization The third and final type of spot market failure leading to a role for firms which we will consider is known as double marginalization. It occurs when multiple firms in a supply chain have significant market power. Each adds their own profit margin, resulting in a final price higher than what would be charged if all the firms jointly chose prices to maximize profits. Profits are lower in total, final prices are higher, and final output and consumer surplus are lower. Consider the simplest possible supply chain ‐ a single upstream firm that produces parts for a single downstream firm, which then buys these parts and produces the final product for the end consumer (both are monopolists). Assume the downstream firm needs one part from the upstream firm to produce one unit of the final product. An example of this would be a firm that manufactures cars, where the upstream firm produces engines for the downstream firm, and the downstream firm uses one engine to make one car. Then, the total cost of making q cars is the upstream cost of producing q engines, plus the downstream cost of using the engines to produce q cars, or

CTOT (q) = CU (q) + CD (q) and the total marginal cost is

357

MCTOT = MCU + MCD .

To maximize total profit, we want the marginal revenue of our final product to be equal to the marginal cost of the product. Since the downstream firm sells the final product, MRD is the marginal revenue. Thus, to maximize total profit,

MRD = MCTOT .

Now, consider the upstream firm. They are selling products to the downstream firm, and assume that they are the only supplier; thus, they are a monopoly. If they maximize their own profit, they set their marginal revenue equal to their marginal cost:

MRU = MCU .

Since they are a monopoly, the price charged exceeds their marginal cost:

pU > MCU .

What now is the effective marginal cost to the downstream firm? The downstream firm is buying products from the upstream firm at price pU, and assembling them into end products at a cost of MCD; so their effective marginal cost is

MCD′ = MCD + pU .

If the downstream firm maximizes their own profit, they set their perceived marginal cost equal to their marginal revenue:

MRD = MCD ' = MCD + pU .

Observing that PU > MCU, we get

MRD > MCU + MCD .

This says marginal revenue from the last unit sold is higher than the total marginal production cost. This is because the upstream firm maximizes profit by charging a price that’s higher than their marginal cost, and so does the downstream firm. In effect, the downstream firm is buying a product that’s already been marked up, and then marking it up again upon selling it to the end customer ‐ which is where the term double marginalization comes from. What this means for the final price of the downstream firm’s end product is that it will be higher than the price that maximizes total profits. If we look at the downstream firm’s profit function, they face demand represented by p(q) and have costs of production CD(q) as well as the costs of buying the upstream firm’s components at pU; so, their profit function is

πD = p(q)q − pU q − CD (q) .

Maximizing we get

MR − pU − MCD = 0 . 358

If we solve for pU we get

pU = MR − MCD .

Thus, we have a function that describes the price that the upstream firm can charge the downstream firm; in other words, we have an inverse demand function for the upstream firm. This says that the highest price the upstream firm can charge is the difference between the downstream firm’s marginal revenue and their marginal cost. The figure to the right illustrates this scenario graphically. The upstream firm maximizes profit, setting MRU equal to MCU. Thus, they sell q components to the downstream firm at a price of pU, which is determined by their demand curve obtained from the previous discussion (pU = MR – MCD). The downstream firm’s effective marginal cost is the price they pay for the component plus their marginal cost of production (pU + MCD), and when they maximize profit they set their effective marginal cost equal to their marginal revenue. This gives q final products sold at a price of pD.

pU+MCD

MCU

MCD MR q

MRU

p(q) q

pU = MR‐MCD

Now suppose these two firms pU+MCD vertically integrated, thereby pD avoiding the double marginalization p* MCTOT = MCU + MCD problem. Then, the total marginal MCU cost of producing the end product is p U just MCU + MCD. The situation would look like graph to the right. The firm maximizes profit by setting MR MCD p(q) equal to MCTOT, and ends up with a quantity of q* units at a price of p*. q MR So, we can see through the two graphs that firms acting individually q q* MRU PU = MR‐MCD add two mark‐ups to the same product, and end up charging a price that’s higher than the price that maximizes profit.

359

Example Final demand is p = 11− q . The 4 upstream firm’s cost of production is CU = 0.2q, and the downstream firm’s cost of production is CD = 0.8q. Begin by finding the upstream firm’s inverse demand function. We do this by starting with the downstream firm’s profit function and maximizing.

(

π = 11 − q

) q − p q − 0.8q . U

dπ = 11 − q − pU − 0.8 = 0 2 dq

(

) q − 0.2q .

Maximizing we find to following.

dπU = 10.2 − q − .2 = 0 dq

q = 10

π D = (8.5 − 5.2 − 0.8)10 = 25

π TOT = (8.5 − 5.2 − 0.8)10 = 25

(

π = 11 − q

) q − .8q − 0.2q .

Maximizing gives the following solution.

pU = 10.2 − q 2 .

πU = 10.2 − q

π U = (5.2 − 0.2)10 = 50

Now, the upstream firm will maximize their profit. Their profit function is

If the firms were to merge, the profit function for the overall firm would be

and solving for pU we obtain the upstream firm’s inverse demand curve:

p = 11− 10 4 = 8.50 .

Profits are as follows.

Maximizing, we get

The final price of the end product

The downstream firm’s profit is

pU = 10.2 − 10 2 = 5.20

dπ = 11− q −1 = 0 2 dq

q = 20

p = 11− 20 4 = 6

π TOT = (6 − 1)20 = 100

So, prices are lower, quantity is higher, and total profits are higher when the firms cooperate than when they operate independently.

Just as with the other problems considered above, there are several potential ways to deal with double marginalization. One is repeated interaction. We know from our study of game theory that where cooperation may not be possible in a one‐ shot game, it may be possible in a repeated game. But, cheating is always a possibility for one of the players, so this is not an end‐all solution to the problem. Another is to form a contract that specifies what the price will be between the two firms. But, as we know, contracts don’t always work. Plus, being tied down to a long‐ term contract can sometimes hurt the parties more than they benefit them.

360

The final way to deal with this issue is to merge the two firms into a single firm. This is not without its limitations; after forming the firm, you now have two different divisions, and if the managers of each of these divisions are paid based on how well their division performs, they will be incentivized to increase the profit of their division, rather than the entire firm. But, if you are able to have a CEO that can ensure that the price the upstream firm charges the downstream firm is just the upstream firm’s marginal cost (that is, pU = MCU), then the issue will be resolved. This is known as transfer pricing, and the idea is that within a firm, the transfer prices should reflect marginal costs as opposed to some other value that is being marked up.

The Role of the Firm In summary, creating a firm is simply a last resort that can be used to avoid problems encountered when trying to procure inputs from either the spot market or through contracting. The firm is a collection of agreements between agents (management and workers) and principals (stockholders and management) that defines who is the final arbiter of disputes. The reason a firm is a last resort is because by creating a firm, you lose specialization and division of labor. Nevertheless, the firm is a unique solution to problems inherent in dealing with complex contracting environments, unique products, transactions costs, and information problems. It targets areas of a free market economy that don’t function as they are supposed to, internalizes the inefficiencies, solves them using a centralized structure, and creates residual claimants to enforce agreements and profit from them. This is summarized in the figure below. High Transactions Costs? Team Production? Large Specific Investments? Too much market power up or down stream?

Spot Market

Yes Complex contracting environment? High degree of uncertainty? Unenforceable Contracts? Yes Integrate the activity into the firm.

361

Contract

Chapter 21 Terminology The following is a list of terms that you should know in order to discuss and apply the material from this chapter. Spot Market A free market economy in its purest sense. People go to this market to determine the market price of whatever input they need and buy as many of them as they require. It functions well as long as there is perfect information, products are standardized, and transaction costs are low. Free Riding A problem that occurs in team production when a team member fails to work hard, reducing his individual costs but receiving the same amount of benefit from everyone else’s hard work. This can stem from a team production situation in which individual team members bear all the costs of an action but only receive a small share of the benefit. RelationshipSpecific Investment An investment made solely for the purpose of facilitating a certain market transaction, which loses all of its value if the transaction does not occur. Three types include geographic location, physical asset, and human capital. HoldUp Problem Problem that occurs because the party making the relationship‐ specific investment is subject to opportunistic behavior by the other party in the contract. Unless there is a very rigid and explicit contract in place that can be enforced, the party making the investment can be “held up” and pressured for renegotiation by the other party. Because the party that is making the investment can predict this type of behavior, they will tend to sink less money into the initial investment, causing the transaction to have less value than it otherwise could. Vertical Integration Strategy by a firm that uses a hierarchical structure to produce all parts of a product, each part at a different level of the hierarchy, instead of buying them from the spot market or other firms. This occurs when a downstream and an upstream firm merge. Double Marginalization Problem that occurs when both a downstream and upstream firm (unmerged) markup price over marginal cost. This causes price paid by consumers to be much more than optimal.

362