The Neyman-Pearson Theory as Decision Theory, and as Inference Theory

Page 1

The Neyman-Pearson Theory as Decision Theory, and as Inference Theory; With a Criticism of the Lindley-Savage Argument for Bayesian Theory Author(s): Allan Birnbaum Source: Synthese, Vol. 36, No. 1, Foundations of Probability and Statistics, Part I (Sep., 1977), pp. 19-49 Published by: Springer Stable URL: http://www.jstor.org/stable/20115212 . Accessed: 03/10/2011 17:34 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Springer is collaborating with JSTOR to digitize, preserve and extend access to Synthese.

http://www.jstor.org


ALLAN

THE

BIRNBAUM*

NEYMAN-PEARSON

AS

THEORY

DECISION

AND AS INFERENCE THEORY; WITH A THEORY, CRITICISM OF THE LINDLEY-SAVAGE ARGUMENT THEORY

FOR BAYESIAN

1. INTRODUCTION

AND

SUMMARY

of a decision, which is basic in the theories of Neyman and Savage, has been judged obscure or inappropriate Pearson, Wald, of data in scientific research, by Fisher, when applied to interpretations The

concept

and other writers. This point is basic for most statistical Tukey, is based on applications of methods derived in the which practice,

Cox,

theory or analogous Neyman-Pearson least squares and maximum likelihood.

as of such methods applications Two contrasting interpretations are formulated: to 'deci behavioral, applicable

of the decision

concept sions' in a concrete literal sense as in acceptance sampling; and evidential, as a to 'decisions' such in research context, where applicable 'reject H{ the pattern and strength of statistical evidence statistical concerning

is of central interest. Typical standard practice is charac hypotheses terized as based on the confidence concept of statistical evidence, which is defined in terms of evidential of the 'decisions' of decision interpretations concepts are illustrated by simple formal examples with in genetic of and are traced in the writings research, interpretations and other writers. The for Pearson, argument Neyman, Lindley-Savage Bayesian theory is shown to have no direct cogency as a criticism of theory. These

typical

standard

evidential,

practice,

interpretation

2. TWO

since

it is based

on

a behavioral,

not

an

of decisions.

INTERPRETATIONS

OF

DECISIONS'

decision problems are the subject of major theories of modern and have been developed with great precision and generality on statistics, the mathematical side. But in the view of many applied and theoretical

Statistical

Synthese 36 (1977) 19-49. All Rights Reserved. Copyright? 1977 byD. Reidel Publishing Company, Dordrecht,Holland.


20

ALLAN

the statisticians, remained obscure

scope and or doubtful

BIRNBAUM

interpretation in connection

in typical scientific research situations. The reason for concern here is that most

of with

decision

theories

interpretations

has

of data

applied to research data have been given their most systematic mathematical jus turn tification within the Neyman-Pearson and that in theory; theory has been given itsmost systematic mathematical within the development (non-Bayesian)

statistical

statistical methods

decision

initiated by Wald. theory statistical hypotheses which may

In this

the alternative be 're development on or a are the basis of identified jected' 'accepted' testing procedure in the formal model with the respective 'decisions' of a appearing decision

problem. each confidence interval which may be determined Similarly, by an estimation procedure is identified with one of the 'decisions' of a model. This leads to questions about the scope and interpretation of the 'deci sion' concept which have been discussed by a number of writers: In what to regard the results of typical scientific data sense, if any, it is appropriate on standard methods of testing and estimation as based statistical analysis decisions? shall treat this question in a way which is self-contained, and more some in respects than previous discussions. Our intention is to systematic We

in certain respects, without and clarify previous discussions or to review summarize The them. interested reader is urged attempting to read or re-read such earlier discussions, those of Tukey particularly and below. others cited (1960), Cox (1958, p. 354), complement

'decide' and 'decision' were used heavily by Neyman and in the series of joint papers which initiated their theory, notably in the preliminary paper of 1928, and in the 1933 paper in exploratory terms

The

Pearson

in a problems of testing statistical hypothesis were first formulated a as case can of be which statistical decision way regarded problems. of statistical deci A frequently cited ('paradigm') type of application

which

sion

theories

and of the Neyman-Pearson sampling (Neyman and Pearson,

theory

is that of

industrial

1936, p. 204; Wald, 1950, or must not to A manufacturer decide whether pp. 2-3): place a lamp batch of lamps on the market, on the basis of tests on a sample from the batch. acceptance


THE

NEYMAN-PEARSON

of decision The simplest models our present purposes of discussion, Simple

hypotheses:

Possible

decisions:

Error

THEORY

21

are characterized problems fully, for of sch?mas the by following form: Hx, H2

dx,

d2 a = Prob

probabilities:

?

[?i|Hi],

= Prob

[d2\H2]

A simple hypothesis is any probability distribution which may be defined over the range of possible outcomes (the sample space) of an experiment or observational procedure. For example,

the lamp manufacturer in the simple may be interested a that of contains batch defective 4% lamps hypothesis H1 exactly lamps, and in the alternative that the batch contains simple hypothesis H2 exactly

10% defectives, possibly because a batch is considered definitely if it has 4% or fewer defectives, and is considered definitely bad if it

good has 10% or more

defectives.

For a given batch, withhold di:

his possible decisions are: the batch from the market;

and

the batch on the market.

d2: place The performance of any decision function (that is any rule for using data on a sample of lamps from the batch to arrive at a decision d\ or d2) is error prob characterized and H2, by the respective fully, under Hi a

and ? defined in the schema. of a decision (An example is the rule: Place the batch on the market if and only if fewer are found in a random sample of 25 lamps.) than 3 defectives

abilities

function here

Consider the interpretation of the decisions dx and d2 which appear in the schema, in its application to the problem of the lamp manufacturer. When the manufacturer he places a batch of lamps on the market, an so or If one more action. he does after considering also performs alternative decision Here

actions, as possible in favor of that action. the terms

'decision'

in a simple

and

in our

example,

'action'

refer

then he has to the behavior

taken of

a the

direct and literal way. We shall use the term interpretation of the decision concept to refer to any compara of a 'decision' appearing in a bly simple, direct, and literal interpretation formal model of a decision problem.1

manufacturer behavioral


22

ALLAN

BIRNBAUM

in the behavioral interpretation must be criticized and rejected, a and when such of many schema and statisticians, investigators model are applied in a typical context of scientific research in connection The

view

with

of data analysis. Convenient examples may be which have the studies, general scientific goal linkage of the which 'chromosome knowledge map' largely charac

standard methods

drawn

from genetic

of extending terizes a species

or strain in classical Mendelian genetics.2 an Consider investigator who judges that his linkage studies provide that two genetic loci lie on the same chromosome very strong evidence reverse that future studies could conceivably usual the (with appreciation a his judgement); and who reports his conclusion, with summary together of it, based interpretation the determined Neyman-Pearson by applying in a research 1955, or Smith, 1953, pp. 180-3), of his data

and his

in part on use of a test theory (as in Morton,

journal. favoring the scientific hypothesis of linkage corresponds in some way to a 'decision' dx in a schema like that above, where now Hx no linkage. It is the nature of is the statistical hypothesis characterizing His conclusion

this correspondence

3.

which we wish

STATISTICAL

EVIDENCE, BY

REPRESENTATION

problem

statistical

AND

DECISIONS'

carefully.

ITS OF

INADEQUATE DECISION

THEORY

is often described (in the a as of papers elsewhere) problem Neyman-Pearson deciding such asHx (e.g. Neyman whether or not to 'reject a statistical hypothesis' and Pearson, 1928, p. 1; 1933, p. 291). This suggests the interpretation most of testing as decision writers who formulate problems given by The

of testing

THE

to examine

hypotheses

and

problems: dx:

reject Hx

d2:

do not reject//!.

to the question: What is the leads immediately this interpretation of for of the situation the inves in, example, 'reject Hx interpretation that linkage was present? tigator of our example who concluded

But

if the geneticist uses typical terminology such as 'reject Hx, understand of no linkage,' neither he nor his colleagues hypothesis Even

the that


THE

THEORY

NEYMAN-PEARSON

23

sense which could be he ismaking a decision in any literal and unqualified a with that of the behavioral closely comparable interpretation given in the example above. decision lamp manufacturer's term 'reject' expresses here an interpretation the decision-like Rather, of the statistical evidence, as giving appreciable but limited support to one of This evidential statistical hypotheses. of the alternative interpretation results is in principle based on a complete schema of the the experimental indicated above, even when this is only implicit. In this essential suggested above between respect, the identification and is of the schema is inadequate, and the single element dx 'reject Hx

kind

misleading statistical d\*:

when evidence

taken out of the context are adequately

(reject Hx

forH2,

a, ?)

(reject H2

forHx,

a, ?),

of the schema.

represented

by symbols

Such cases of like

and d2 :

each of which

of the complete schema which serves of statistical for the interpretation frame of reference

carries an indication

as the conceptual evidence here.

The symbols d* and d2 represent in prototype typical interpretations in scientific and reports of data treated by standard statistical methods research

contexts.

interpretation of the decision concept of models of decision problems; and we shall to refer to such applications to refer to such use the term confidence concept of statistical evidence We

shall use the term evidential

of statistical evidence. interpretations In the view of this writer and some others, although typical applications in research are of the kind we have of standard statistical methods and interpreta illustrated, the central concepts guiding such applications tions (for which we have introduced the terms in italics above) have not been defined within any precise systematic theory of statistical inference. these concepts exist and play their basic roles largely implicitly Rather, of stan and interpretations in guiding applications and unsystematically, of new statistical methods. and in guiding the development dard methods, We even

shall not offer any precise theoretical account of these concepts, nor claim that such an account can be given. Our aims are limited to


24

ALLAN

the existence

illustrating

BIRNBAUM

and wide

scope of the confidence

concept,

and

clarifying some of its features. seems to be in part a primitive The confidence intuitive concept of with sch?mas of statistical evidence associated the above concept kind, which may (Conf):

be expressed

in the following

formulation:

prototypic

is not plausible unless it finds evidence as against Hx with small probability (a) when Hx is true, and with much larger probability (1-/3) when H2 is true. A

concept of statistical 'strong evidence forH2

con The following are simple examples of the confidence Examples. in of statistical evidence. be of context the of the cept They may thought of described of above. The investigation genetic linkage interpretations are in the first person because statistical evidence expressed they illus trate in simple cases the writer's own practice and thinking concerning as an independent evidence, based in part on some experience some new methods of data and of in interpreter genetic developer as on as Mendelian well (Birnbaum, 1972), theory and data analysis statistical

and analysis of general statistical practice and these examples, and their interpretations in follow are typical of widespread statistical thought and practice,

extensive

observation

thinking.

In my view

ing sections, that they are given here with with the qualification explicit expression which is unusual. The interested make an independent judgment about this.

a degree and style of reader will of course

to the usage of Savage first person form is somewhat analogous decision is from the standpoint (1954) whose Bayesian theory developed of a generic rational person 'you'. In a following section these examples The

will be referred

to in the course

of a critical discussion

tions of Savage's and Wald's decision theories. Symbols of the form dx and d2 introduced above

of some assump are used

to present

the examples.

(1) I interpret (reject Hx as strong

statistical

for H2,

evidence

(reject H2

for Hi

0.06, for H2 0.06,

0.08) as against Hi. 0.08)

Similarly

I interpret


THE NEYMAN-PEARSON as strong statistical

evidence

THEORY

25

as against H2.

for Hx

(2) I interpret (reject Hx as conclusive

for H2,

as against Hx. Here the zero value of the of the first kind indicates that the observational results

evidence

error probability are incompatible

0, 0.2)

for H2

with Hx.

(3) I interpret (reject Hx

for H2,

as very strong statistical

0.01,

evidence

0.2) for H2

as against Hx.

(4) I interpret (reject H2

for Hx,

0, 0.2)

as weak

statistical evidence forHx as against H2. Here the relatively large 0.2 of the error probability of the second kind suggests relative this evidence against H2. skepticism concerning

value

(5) I interpret (reject Hx as worthless

for H2,

statistical

0.5, 0.5)

evidence.

It is no more

relevant

to the statistical

is the toss of a fair coin, since the error hypotheses (0.5, 0.5) also represent amodel of a toss of a fair coin, with probabilities one side labeled 'reject Hx and the other 'reject H2. If such a case arose considered

than

our comments would lead us to judge test to least the be worthless. adopted, The distinction between the two interpretations

in practice,

epitomized

(as Bernard

Norton

has pointed

ordinary usages:3 behavioral: 'decide

to' act in a certain way,

out)

the experiment, of

or at

'decision' may

by contrasting

be the


26

ALLAN

BIRNBAUM

and evidential: that' a certain hypothesis supported by strong evidence.

'decide -

the different

Concerning true or well

is true or is

identification

(pragmatist) with 'decide

to act as

of 'decide that A

is

is true or well

if A

supported' itwill be clear from discussion above and below that we reject supported', and regard conclusions and statistical any such simple identification, as having autonomous status and value. evidence The

were

considerations

though

less

an inference we are 'deciding' a statement to make be argued that in making and that, therefore, the word decision type about the populations provided too narrowly, the study of statistical decisions embraces that of inferences. interpreted

is not

preceding formally by Cox

(1958,

emphasized p. 354) as follows:

clearly

itmight certain

point

here

is that one

of

the main

of statistical general problems can usefully be made and exactly

what types of statement deciding decision statistical theory, on the other already

the possible

hand,

inference

of a The

consists

in

they mean. are considered

In

what

decisions

as

specified.

between of the two interpretations analysis of the distinctions in 'decisions' of decision is those sections below theory provided which treat certain assumptions underlying Savage's and Wald's decision to regard evidential theories. In particular, it is shown that if one wishes Further

the

statements

for example,

represented, (reject Hx

df: as 'decisions'

forH2,

0.05,

in a formal model

by 0.05)

of a decision

then certain basic problem, are of statistical decision theories assumptions incompatible with certain statements. and meanings of those evidential basic properties 4. STATISTICAL

AS

EVIDENCE

ONE

REGARDING

CONSIDERATIONS SCIENTIFIC

AMONG

SEVERAL

SUPPORT

OF

CONCLUSIONS

a conclusion in a scientific reached (1960) has emphasized, our as two loci are of that the conclusion such geneticist investigation,

As Tukey linked,

requires (a) statistical hypotheses

not only evidence

of sufficient

of interest.

strength

concerning

the statistical


THE

NEYMAN-PEARSON

THEORY

27

the investigator (or community of investigators) must model, which (b) the adequacy of the mathematical-statistical as the conceptual for the interpretation frame of reference

In addition

to represent

statistical

the research

situation

judge serves of the

in relevant

evidence, respects; and and evidence of a conclu with other knowledge (c) the compatibility sion that may be supported by statistical evidence provided by the evidence current (for example, strong statistical investigation no representing against the statistical hypotheses linkage).4 prevent us from regarding a scientific important considerations as being determined in any simple or exclusive way by the conclusion statistical evidence which may support it. The Neyman-Pearson theory introduced a kind of formal symmetry of problems of testing statistical hypotheses, into the formulation by

These

requiring explicit error probabilities the complement

of alternative statistical hypotheses and specification our of the second kind (e.g. H2 and ? in schema) to a in our traditional and Hx specification (e.g. just

schema). But inmany definite

early and modern applications of statistical tests, there is a in the status of the alternative lack of symmetry statistical

in the status or related to a lack of symmetry considered, or of scientific conclu significance corresponding hypotheses possible sions. For example inmany cases one scientific hypothesis is regarded as

hypotheses

on the basis of current

or at least as acceptable knowledge, or plausible, unless and until sufficiently clear and strong evidence against it appears. Clearly such considerations lie outside the scope of mathemat and statistical ical statistical models in the sense discussed evidence established

and above, but rather in the scope of the scientific background knowledge judgment referred to in (b) and (c) above. In traditional formulations of testing problems which preceded the and to which continue in appear prominently Neyman-Pearson theory itmay be more statistics, in various applications applied and theoretical or less plausible to suppose that there is implicit, though not explicit, error to alternative reference and corresponding statistical hypotheses an as for of the basis choice and reasonable probabilities, implicit part interpretation

of a test statistic;

and possibly

to suppose

also that there is


28

ALLAN

BIRNBAUM

to possible alternative scientific hypoth implicit if not explicit reference eses or possible to such conclusions corresponding implicit statistical not does extend to tests in of the The scope present paper hypotheses. to extent formulations the that such traditional except they may be an as being interpreted at least in principle with application regarded in to some alternative statisti implicit, if not explicit, reference plausible terms as 'standard cal hypotheses. Such and statistical methods' 'standard methods this paper, must confusion.

as used throughout of testing statistical hypotheses', to avoid be understood with this important qualification

5. THE

THEORETICAL NEYMAN-PEARSON

AMBIGUITY

OF

THE

THEORY

in its mathematical The Neyman-Pearson form as theory is interpretable a special restricted part of general statistical decision theory, as we have to the extra indicated above and will elaborate further below. As and theory, which relate that mathematical interpretations one may say that there are two Neyman-Pearson form to applications, theories:

mathematical

One

is based on behavioral

has been

elaborated

behavior

as mentioned

of the decision concept, and interpretations terms in of his concept of inductive by Neyman above. It is difficult or (in the view of the present

to discover or devise clear plausible and some others) impossible in typical scientific research situations of this interpretation examples are applied. (The interested reader will make an where standard methods writer

independent judgement about this, and may wish to consider the exten of Neyman himself to the interpretation sive and important contributions of scientific data in several research areas.) structure of The second theory which makes use of the mathematical on is based evidential of the the Neyman-Pearson theory interpretations in that theory, and has as its central concept what we have 'decisions' - a called the confidence concept of statistical evidence concept whose essential role is recognizable research throughout typical applications of standard methods, but a concept which has not interpretations in any systematic been elaborated theory of statistical inference. and


THE

even

Since

NEYMAN-PEARSON

the existence

THEORY

of

this important of the mathematical

29

distinction

two

between

structure of the Neyman interpretations nor not is very widely clearly appreciated, much of the theory in the statistical found literature is not and misunderstanding obscurity and obscurity surprising. A simple step toward limiting this confusion theoretical

Pearson

would

be to make

consistent

view whenever

use of terms which

keep

such as 'confidence

the distinction

in

and 'evidential'

or

necessary, concept' and to avoid unqualified use, when ambiguity interpretation; and confusion could result, of such standard terms as: the Neyman Pearson 'objectivist', theory (or approach, or school); and 'frequentist', 'behavioral'

'orthodox', 'classical', 'standard', and the like. seems to have some In the many applications where each interpretation the two interpretations may role, a sharp theoretical distinction between have particular value in helping to clarify the purpose or purposes of the For example, application and guide the adoption of appropriate methods. new knowledge about a genetic linkage may have immediate value as a of a particular basis for the genetic counseling family. Here one can in two of models decision consider problems as having some scope principle in the literal 'decisions' situation, one having interpreted sense (for example 'do not have another child' or 'do'); and the other model having 'decisions' with evidential (for exam interpretations to related scientific conclu statistical possible hypotheses ple concerning in the same

behavioral

sions about genetic if various Even

linkage). details of

the two models

should

(for correspond example the two decision functions adopted might, though they need not, in kind of interpretation), the in form though different be identical purposes and problems considered would be distinct, and hence properly and treated by distinct theoretical concepts. characterized In other applications where there is a problem of decisions in the sense, one may seek conclusions (or strong statistical evi a as for decisions In such cases, if some basis dence) making judiciously. to be an accurate model formal model of a decision problem is considered behavioral

in the relevant respects, one may as such is at (or statistical evidence) at worst and from clear distract ous, may appreciation decision problem and accurate model. On the other hand, that any formal model of the decision problem has sufficient of

the real situation

consider

conclusions

argue that to best superflu of

the actual

if it is not clear realism

to be


30

ALLAN

BIRNBAUM

or statistical of new knowledge (conclusions as a be basis for decisions.5 evidence) may naturally sought making The second example of the 1936 paper of Neyman and Pearson involves explicit consideration of both conclusions and related decisions, but is discussed so briefly and incompletely that I am unable to interpret it then development

applied,

from the standpoint of the preceding paragraphs. No other examples of were discussed in the joint papers. Thus the joint papers applications an contain no discussion of in which a scientific conclusion application was

the sole or primary

S. Pearson conclusions conclusions

6. THE

of an investigation. Various discuss applications 1937,1947,1962) object

(notably and decisions sought

CONCEPTS

(in the behavioral sense) as a basis for making decisions.

OF

TESTS

OF

NEYMAN

AND

DECISIONS AND

writings of E. inwhich both

are of interest, with

IN THE

1933

PAPER

PEARSON

The

1933 paper of Neyman and Pearson begins (pp. 141-2) with explicit about the meanings of concepts and methods of testing. The authors discuss "What is the precise meaning of the words 'an efficient test of a hypothesis?' There may be several meanings." concern

in the preceding litera concept of an 'efficient test' had appeared of testing, but the term 'efficient' had been introduced into mathematical statistics by Fisher in connection with his theory of estima tion in the early 1920's. No

ture

Fisher's

power and conceptual theory, with its striking mathematical and in stood the of the efforts of Neyman obscurities, depths background a to initiate and Pearson comparably systematic theory of tests, as they to their exploratory indicated in the introduction paper of 1928. Their in an exact form (rather than by asymptotic plan to treat testing problems case for of the approximations large samples, as Fisher had done) would some purely eliminate technical and thereby facilitate complications clarity concerning of tests.

concepts

such as 'efficient' or its analogues

in a theory

the side of applications, there was as much need for a systematic theory of tests as there had been for a more systematic theory of to in alternative estimation, guide investigators choosing among possible On


THE

NEYMAN-PEARSON

THEORY

31

sense in problems of increasing complexity, where the common had guided traditional and faltered. testing practice (Neyman Pearson began their 1930 paper with discussion of Romanovsky's 1928 paper which had given new distribution theory for several statistics for a tests

which

standard mining The on

out the open basic problem of "deter one to use in any given case.") appropriate a of 'an efficient test' which is clear definition supplied

testing problem, which is the most

1933 paper the mathematical

pointing

side, and is neutral in relation to the contrasting and evidential of 'decision' discussed above. interpretations An efficient test is defined as one in which the error probabilities (such as a and ? in our schema) are minimized (jointly in some appropriate evidential or behavioral of 'decisions' are sense). Whether interpretations

behavioral

seem to be a of error probabilities would an No of 'efficient test' has, even now, clearly appropriate goal. concept been proposed in terms of the earlier tradition of formulating testing to error probabilities under alternative (without reference problems In this sense one may say that it appears to have been hypotheses). in view,

such minimization

formu 'necessary' to make some change in the traditional mathematical as a basis for introducing a concept of an lation of testing problems, 'efficient test' which might guide applications and theoretical develop ments.

In any case, Neyman and Pearson met a problem of broad theoretical and practical scope by changing some of the terms of the problem, as have in done all problem areas.6 original investigators frequently some change in the mathematical formulation of testing Although seems to have been necessary, in the sense just indicated, the problems of the Neyman-Pearson innovation theoretical theory, the behavioral was not of sense: An in the tests, necessary interpretation following evidential has been associated with typical applications of interpretation tests in scientific research investigations in all periods of their use (which dates from 1710), without apparent the mathematical ing 1933 when

discontinuity during the years follow structure of the Neyman-Pearson

theory became widely accepted as the new or improved mathematical basis for the theory of tests. This observation 'What roles or functions was suggests the questions: the behavioral to intended serve?' and 'What functions has interpretation it served?' The joint papers suggest less than clear answers, while later


32

ALLAN

BIRNBAUM

and Pearson clearer suggest separately for the respective authors. the 1933 paper begins, as we have noted, with concern about Although of testing, it discusses the meanings of concepts only a mathematical an of the of 'efficient test' and the ; aspect meaning meaning of 'a test' (or a

papers written by Neyman answers which are different

is not discussed such as 'reject Hi) to extra-mathematical Brief interpretations. 'decision'

and evidential

behavioral

Behavioral:

"Such

when"...

a rule

interpretations

with regard systematically but clear and contrasting

appear:

as to whether case H tells us nothing in a particular is true ... "or false when"... ... "But... if we behave "rejected." then in the long run we shall reject H when it is true not more,

"accepted" to such a rule,

according say, than once

in a hundred

the frequency concerning Evidential: 1. In the "method

times, and in addition we may have" analogous of rejections of H when it is false." (p. 142.) ... in common use ... If F were of attack very

as an indication be considered that the hypothesis, H, generally false, and vice versa." (p. 141.) 2. "Let us now for a moment consider the form in which judgements

would

practical degrees reached when when

The

experience. of confidence; the following

We

may accept or we may decide

or we

may reject to remain in doubt.

position must be recognized. it is true; ifwe accept H0, we may be accepting is true." (p. 146.) really some alternative Ht

attitude

authors'

toward

a hypothesis But whatever

assurance

small,

was

this

probably

are made

in

with

varying conclusion is

If we

reject H0, we may reject it it when it is false, that is to say,

is not made interpretations from p. 142 gives approvingly the

evidential

quite clear. The preceding quotation a test in the new mathematical of behavioral interpretation as against the traditional "method of attack ... in common

formulation, use" (tradi

But the formulation, with evidential interpretation). from p. 146 (in a discussion not linked by the authors with that quotation of a test the evidential of pp. 141-2) describes approvingly interpretation formulation. in the new mathematical tional mathematical

is this apparent discrepancy An interpretation which would reconcile a as to not in intended to regard the behavioral apply interpretation sense in any direct, literal, or concrete situation of scientific research of the with an evidential incompatible interpretation a situation in a 'decisions' in question; but rather intended to apply in such way which is heuristic or hypothetical, serving to explain the inevitably with the error probabilities, associated abstract theoretical meanings

which

would

be


THE

NEYMAN-PEARSON

'decisions' such as 'reject Hx, a formal model of a decision

formal on

THEORY

33

and evidential

based interpretations Thus (test). hypothetical

problem interpretations may be regarded as playing a role in the inner theoretical core of the confidence concept.7 This interpretation of the relation between behavioral and evidential

behavioral

interpretations

seems

to that expressed

close

in various by E. S. Pearson Professor Pearson has 1962).

1937, 1947, 1955, (in particular notes which from unpublished the following quotations kindly permitted on an earlier draft of the present he wrote in April 1974, as comments terms 'behavioral' and do not appear in the 'evidential' (The paper. in their terms there the 'literal' appear places original notes; respective

writings

and

'elliptical',

which

were

used

in the earlier

version

of the present

paper.) as a practising statistician would have been what my outlook [In the 1920's and 1930's]... But to build such a structure one had to set out a mathematical you term evidential. theory ... I on the face of things, suggested a behavioral which led to rules which, interpetation. think you will pick up here and there inmy own papers signs of evidentiality, and you can say now

that we

or I should have stated clearly the difference between the behavioral we have suffered since in the way the people interpretations. Certainly ... concentrated (to an absurd extent often) on behavioral interpretations

and

evidential

interested in an application where a is when he encounters interpretation appropriate, a as of statistical in method such appears many interpretation re interpre and theoretical works, supplies his own evidential

Itmust happen evidential

an

behavioral

have

frequently

that a reader

expository tation of the given behavioral interpretation one, in order to relate the method cogently and interpretation. The 1920's and 1930's were

a period

if the writer

has not supplied

to his intended

of much

application

critical concern with

the

of terms and concepts in the possible meaningless as as of and various other well science, psychology, disciplines philosophy concerns were usually pursued in statistics. These in terms of such

meanings

and

or verificationism. as behaviorism, operationalism, Various writers applied these criteria with varying degrees of strin gency, greater stringency entailing smaller scope and importance for the doctrines

theoretical

and hypothetical concepts. the widest and most lasting

Perhaps been heightened

appreciation

qf both

influences the values

of these doctrines

have

and the limitations

of


34

ALLAN

BIRNBAUM

for the analysis and development of a discipline, along with a of the roles of essential theoretical, hypothetical, balancing appreciation and perhaps even metaphysical concepts. such criteria

7. THE

OF

STATUS

THE

IN THEORY

CONFIDENCE

AND

CONCEPT

APPLICATIONS

and theoretical above, there is no precise mathematical use in of the the wide confidence which concept system guides closely can not is clear that further alter this standard practice. (It developments to the theoretical situation. Cf. Birnbaum, 1969.) Rival approaches

As mentioned

interpretation

of

research

offer attractive

and Bayesian (notably the likelihood features of systematic precision and general fail to satisfy those who prefer the confidence

data

approaches) ity; but their basic concepts concept for the kind of theoretical

control it provides over the objective error probabilities in sch?mas like that above).8 of interest (appearing in all The ad hoc aspects of the confidence concept are encountered that of above. testing genetic linkage discussed including applications, to its mathematical basis in the Neyman aspects are related as follows. Pearson theory of of two simple hypotheses, the problem In a given problem a and ? (solved by Neyman and of error probabilities minimization Pearson in 1933) leads not to a unique best test or decision function but to These

a family of best tests, each of which has the smallest possible value of ? among all tests with the same (or smaller) value of a, including for best tests: the following points (a, ?) representing respective example (0.01, 0.05),

(0.02, 0.02),

and (0.05, 0.01).

such as our linkage investigation, For a given application nothing in the nor to a particular the leads confidence concept Neyman-Pearson theory are choice among these, yet choices of this kind always made, implicitly if not explicitly, whenever the confidence concept is applied. concept is its aspect of the ad hoc character of the confidence not been in which has very widely great potential flexibility applications, exploited. We may illustrate this in the preceding problem of two simple tests were considered. We may define a where three possible hypotheses, Another

generalized

kind of

test of statistical

hypotheses

in terms of a formal


THE

THEORY

35

taking three (rather than the usual

function

decision

NEYMAN-PEARSON

two) possible

values,

as follows: The decision

function

takes the possible

dx:

strong evidence

d2:

neutral

d3:

strong evidence

or weak

forH2

values:

as against Hx

evidence forHx

as against H2.

It takes the value dx on those sample points where the test characterized it takes the value d3 on those points reject Hx; by (0.01, 0.05) would and it takes the value d2 on where the test (0.05, 0.01) would accept Hi, test requires a the remaining sample points. Such a 'three-decision' scheme of a new form to represent itsmore numerous error probabilities, which

as follows:

are defined

= <xi Vrob{di\Hx) ? of a major probability a2

error of Type

I

of a minor

error of Type

I

= Yro\> {d3\H2) of a major probability

error of Type

II

error of Type

II

= =

?i

Prob(?/2|//1) probability

=

02

= =

Prob(rf2|i/2) probability

of a minor

that the original tests were best, that these (It follows from the assumption error probabilities are minimized jointly in the usual sense. The ad hoc tests has not been eliminated, character of two-decision but reappears in the tests; and is illustrated once more by considering test which could be determined four-decision simi

such three-decision possible

alternative

above. larly by using also the test characterized by (0.02,0.02) In contrast and the the likelihood approach, technically Bayesian approaches, direct interpretations

related

are formally elegant, allowing intuitively plausible of all possible numerical values of the likelihood

ratio statistic as indicating strength of statistical evidence in this problem.) As other examples of methods for implementation of the confidence concept,

outside

the familiar

categories

of testing

and of estimation

by


36

ALLAN

BIRNBAUM

confidence nested regions and regions, we may mention and Schatzoff, tests (e.g. Birnbaum, 1961 Dempster ; 1965; Stone, or more for three and methods 1969); testing' among 'generalized and and for classification alternative statistical hypotheses (e.g. Birnbaum confidence related

Maxwell,

1960). theoretical

specifically concerned with apparent in the way of giving a precise general impossibilities treatment of the confidence theoretical concepts, concept and associated we may mention Barndorff-Nielsen (1959), (1971, 1973), Buehler Cox Birnbaum and Fedderson Buehler 1970, 1972b), (1969, (1963), Among difficulties

contributions

or

(1971), and Durbin (1970). The

confidence

concept

mathematical

an extra upon which appear in as a is usually described

in principle depends of the error probabilities

interpretation like that above, and this interpretation and the same terms are often interpretation; 'frequentist' or 'objectivist'

sch?mas used

to describe

The

two theoretical

the whole

based

approach

of the

interpretations in interpretations

on the confidence 'decision'

concept

concept. discussed

of probabilities. used among has term propensity become widely interpretation in recent years to denote the kinds of 'objective' interpreta philosophers terms in and accurate for many theoretical tion which seem appropriate for Mellor, 1971; (See science, including probability. Hacking, example 1965; Braithwaite, 1954.) The confidence concept seems to call for this above

have

analogues

The

kind of interpretation

of error probabilities,

rather than any more

directly as we behavioristic) interpretation, frequency (literal, operationalist, this On of the confidence have indicated in earlier discussion concept. as of of criticisms view, against interpretations probability, frequency are not relevant to the confidence concept. interpretations, a in scientific of rounded any probability interpretation (Presumably a and of role statistical for would evidence, concepts specify discipline

propensity

perhaps

'practical certainty' associated with some associated with probability among the aspects of meaning theoretical terms, such as 'genetic factor' in Mendelian

also for the notion

applications, and related

of

genetics.) We shall not attempt to survey the current status of the confidence concept in theory and applications. This would be a formidable task, since of call for an account of the largely implicit interpretations it would


THE

NEYMAN-PEARSON

37

THEORY

in a great variety of scientific research methods a in and literature statistical large growing disciplines, including theoretical and expository works. It is hoped that the present paper will prove helpful to the interested reader as he makes his own observations standard

statistical and

the nature of concerning judgements in work various statistical disciplines. applied

and

8. OBJECTIONS

TO

LINDLEY-SAVAGE

A

BASIC

standard

theoretical

ASSUMPTION FOR

ARGUMENT

OF

BAYESIAN

and

THE THEORY

of the important and influential theoretical arguments for Bayesian We shall show here that this argument. theory is the Lindley-Savage force, as an argument for argument has no direct relevance nor persuasive as against typical standard statistical practice with Bayesian methods

One

of the argument data, by showing that an assumption holds only for 'decisions' under behavioral but not under interpretations, which constitute the evidential standard statistical prac interpretations scientific

research

tice.

in terms of simple is elementary, argument being formulated like those above. The original of tests (decision functions) examples of the argument somewhat informal accounts (1962, pp. by Savage 173-5) and Lindley (1971, p. 13-14) should be read by the interested The

reader. They are complemented by a formalized version in an appendix below. additional discussion, The Lindley-Savage argument concerns judgements

of the argument,

with else

indifference

or of preference decision functions

between

alternative

on some

of statistical simple examples to express in the first person

(equivalence) with each decision function (tests) in problems of two simple hypotheses, = a in P the unit (a, ?) square, determined represented by its by point error probabilities a and ?. Our

discussion

evidence

given

will above,

be based which

we

continue

usage.

In some research situations Iwould strongly prefer to use a Examples. decision function (test) characterized by (0.05, 0.05) rather than one characterized by (0.1, 0). In such situations by use

of

(0.05,0.05),

I particularly

value

that strong

the guarantee, which is provided evidence will be obtained (either


38

ALLAN

supporting Hi (0.1, 0) allows

against H2, the possibility

BIRNBAUM

or supporting H2 that merely weak

against Hx).

The

use of

evidence, represented by will For be the obtained. 0.1, 0), example, knowledge (reject Hx in the background of a linkage investigation may include strong (though not conclusive) statistical evidence for the locations of all but one of the for H2,

genetic factors which control a certain system of immune reactions; and the current investigation may have as its object just to determine whether No. 1 or No. 2. the remaining factor lies on chromosome of Let Hx now stand for the hypothesis known to lie on No. 1, and H2 the alternative

linkage with

another

factor

In this situation hypothesis. I would avoid the risk of getting merely weak evidence by choosing and would be able to complete rather than (0.1,0); the (0.05,0.05) a on basis of (chromosome pattern of knowledge map) of the system strong evidence. in some situations

consistently

(including the same linkage investigation), Similarly, I would prefer (0.05, 0.05) to (0, 0.1), for similar reasons. In some situations (including the same linkage investigation) Iwould be indifferent as between (0.1, 0) and (0, 0.1), on grounds of their symmetry in question. the investigation and of judgements of symmetry concerning of preferences

This pattern

may

(0.05, 0.05)>(0.1, where

>

be summarized

0)^(0,

by

0.1),

~ stands for 'is to.' for 'is preferred to' and equivalent of the is incompatible with Assumption of preferences (II) as formulated in the appendix. (It is also argument

stands

This pattern Lindley-Savage

theory, as will be incompatible with a basic premise underlying Wald's to next suffices illustrate that that indicated in the section.) This example is not satisfied generally by the 'decision' concept associated assumption in tests as interpreted with statistical (not behaviorally) evidentially typical research applications. A different but analogous is the preference pattern

example

(0.1, 0)~(0,0.1)>(0.05, In some

incompatible

with Assumption

(II)

0.05).

In I would have this preference situations pattern. a of in the if the knowledge linkage investigation background particular, for the locations of all but one of statistical evidence includes conclusive research


THE

the factors which

control

in view

scientific

THEORY

NEYMAN-PEARSON

certain

39

immune

then with certain reactions, rather than the guarantee prefer,

Iwould

goals strongly of strong (but inconclusive) evidence provided by (0.05, 0.05), the uncer of completing tain possibility with conclusive evidence the pattern of in question which is provided by either (0.1, 0) or (0,0.1); and knowledge I would

be indifferent

as between

them.

in one

(II) expresses important way the concept of ration or to all statistical is central coherence) which ality (or consistency, decision theories. Our criticism of this assumption and the concept it of expresses may serve as a warning against oversimplified judgements Assumption

(or 'inconsistency',

'irrationality'

9. COMMENTS

ON

or 'incoherence').

A BASIC

DECISION

PREMISE

OF WALDS

THEORY

of decision functions play important technical and theoretical of Wald's in the development (1950) statistical decision theory. is symbolized An example of a mixture by

'Mixtures' roles

M

=

?(0,0.1)+?(0.1,0).

as before two decision and (0.1,0) functions (0,0.1) represent error their of characterized The (tests), by respective pairs probabilities. whole expression M stands for another decision function defined in terms

Here

two decision functions and an auxiliary randomization variable, a as a toss fair If of follows: coin shows the the decision coin, heads, say to function is the observed otherwise (0,0.1) applied sample point; is applied. (0.1,0)

of those

To determine

the error probabilities

which

characterize

the decision

function M, we find readily (a,/8)

=

?(0,0.1)+?(0.1,0)

=

(0.05,0.05).

are 0, if (0, 0.1) (For example, under Hx the respective error probabilities is applied; and 0.1, if (0.1, 0) is applied; and each will be applied with

probability \.) The preceding discussion is based on a tacit assumption of a behavioral, and not a literal, interpretation of the decision functions considered.


40

ALLAN

One

way

preceding

of

this

illustrating section:

Suppose

BIRNBAUM

is by reference

my preference

pattern

(0, 0.1)~(0.1,

0)>(0.05,

to an example

of

the

includes 0.05).

it is plausible that Imay be indifferent also as between (0, 0.1) and me an will with of since the latter M, (0, 0.1) or else an provide application I regard as equally satisfactory. of (0.1,0) which But this application includes that my pattern preference implies Then

0.05),

M>(0.05, or, representing

M

now by its pair of error probabilities

as determined

above,

(0.05, 0.05) > (0.05, 0.05) which

is absurd.

is that the preference The fallacy in the preceding discussion pattern of first assumed above arose in an example of evidential interpretations 'decisions', while the calculation of the preceding paragraph was based on a behavioral

In particular, for (0, 0.1) as the preference interpretation. was on a to the based value ascribed against (0.05, 0.05) particularly high of statistical evidence by symbolized possibility (reject Hx in which

the 'decision'

forH2,

0,0.1),

('reject Hx

forH2)

appears within

the symbol

for

an evidential On

interpretation. the other hand, in the calculation

of the error probability

= a=?(0) + ?(0.1) 0.05 we

without ('reject Hx for H2), just the 'decision' the which characterize the error probabilities concerning qualifications which that 'deci from decision functions (sch?mas) different respective that 'decision' behavior that is, we tacitly interpreted sion' can result above,

considered

ally. of The general point illustrated is that while behavioral interpretations 'decisions' may play a very valuable heuristic role in the mathematical statistical and Wald of the Neyman-Pearson theories, development


THEORY

NEYMAN-PEARSON

THE

developed within those theories reinterpreted) with care when considered

methods

can and must for possible

41

be interpreted (or use with evidential

interpretations.

APPENDIX.

THE

ON

LINDLEY-SAVAGE

BAYESIAN

ARGUMENT

FOR

THEORY9

a recog The Lindley-Savage takes as its point of departure argument the (non-Bayesian) encountered whenever nized problem theories of are to and Wald be illustrated the Neyman-Pearson applied, problem as one source of the ad hoc character of the confidence concept: that of choosing among the various best tests (decision functions) (a, ?) available for a given application. The argument shows that if this problem or 'coherently') in a of choice is treated 'rationally' (or 'consistently', above

sense discussed

above

you thought you wanted be viewed as a natural

8, then 'you' are "a Bayesian, whether to be or not.... Thus, the Bayesian position can an overlooked step in the classical completion,

in Section

theory." (Savage, 1962, p. 175.) The last comments refer to the final step of the argument, which may be as follows: Suppose you judge as equivalent, for a illustrated in prototype three decision functions characterized given application, respectively by (0, 0.1),

(0.05, 0.05),

and (0.1, 0).

to be Then ... "you" are "a Bayesian, whether you thought you wanted or not..." sense in the in this context, is that your preference behavior, for example, a Bayesian who from that of a Bayesian; indistinguishable toHx and H2, and losses ascribes prior probabilities gx and g2 respectively to the errors of the first and second types, will also Lx and L2 respectively as that between those three decision be indifferent functions, provided = an aspect of the represents g2L2. Such gxLx 'indistinguishability' is basic to Savage's Bayesian decision behaviorist point of view which are evident of viewpoints theory. But clear and important distinctions who may have a decision here from the standpoint of a non-Bayesian sense to reach a conclusion in who wish the behavioral but may problem a as sense basis for the discussed above) (in making a decision, perhaps he regards no complete model of a decision problem, including are clear also loss functions, as clearly accurate. Important distinctions

because


42

ALLAN

BIRNBAUM

the standpoint of an investigator who has no decision problem in the sense of evidential under the confidence except interpretations concept, and finds no place in his thinking for loss functions nor Bayesian from

even if he may be indifferent of statistical hypotheses, in a probabilities context three tests represented between given research by the three above. points in prototype, follows a final step of the argument, just discussed is that 'you' have a prefer formalized argument whose conclusion ence pattern among tests (decision functions) characterized by indiffer ence sets consisting of parallel line segments which cover the unit square The

more

of points patterns),

(and thus coinciding with certain Bayesian including for example PP' and QQ' in Figure 1.We

(a, ?)

Fig.

assumptions

preference discuss the

1.

of this argument before presenting the derivation itself. The are formulated in terms of the mathematical and derivation

assumptions concept of equivalence of interest interpretation

classes

among points of the unit square; the two tests is that a person's indifference between be that the (decision functions) may by stating points charac represented are equivalent. terizing the tests


THE

ASSUMPTION

NEYMAN-PEARSON

(I). There

two distinct

exist

THEORY

43

points P and P' which

are

equivalent. examples of (I) are P and P' in the figure; and the points (0, 0.1) seems free and (0.1,0) considered in examples above. This assumption from possible plausible objections, for the following reason. The point

Possible

to the point (0,0.1), is preferred and the latter is preferred to on the basis of the non-controversial (0.1, 0.1), principle of inadmissibil which ity (regardless of possible evidential or behavioral interpretations (0,0)

be of

the respective interest). Consider points (a, a) of the line to from and that (0.1, 0.1), (0, 0) suppose segment you judge that no such to (0,0.1). point is equivalent

may

as a

Then

implausible to (0,0.1)' intermediate Our

from 0, your preferences show an continuously some at of value from a, discontinuity 'prefer (a, a) jumping to 'prefer (0,0.1) to (a, a)' without the anywhere assuming increases

value

comments

to a simpler

reference

ASSUMPTION where

equivalent, 1.

between

(a, a) and (0,0.1)'. are stated conveniently assumption

'indifferent on the second restricted

with

case:

then P and P" are also (II*). If P and P' are equivalent, P" = kP + (1 k)P' and k is any number between 0 and

if k = \, P = (0, 0.1), and P' = (0.1, 0), then P" = (0.05, 0.05), of a mixture discussed the example in Section 9 above, representing to be of (0.05,0.05) with where we found the equivalence (0,0.1) For example

under

plausible general

a behavioral

under an evidential in the context

equivalence

of

interpretation

interpretation. of the examples

(II*) is the special Assumption = which R=P Q.

'decisions'

In particular of Section

but not

we rejected 8.

case of the following

assumption

in that in

then Q and Q' are also (II). If P and P' are equivalent, = kP + = kP' + Q (1 k)R, Q' (1 k)R, R may be any equivalent, point, and k may be any number in the unit interval.

ASSUMPTION

where

LINDLEY-SAVAGE

LEMMA. Assumptions (I) and (II) imply that

the unit square is partitioned line segment parallel to PP.

into equivalence

sets, each consisting

of a


44

ALLAN

BIRNBAUM

Proof: (1) By (I) there exist two distinct equivalent points P and P'. on the perimeter of the unit square. Let k be (2) Let R be any point = any number satisfying 0 < k < 1, and let Q kP+(1 k)R, and let = kP' + case The of R collinear with P Q' (1 k)R. (See Figure 1.) are and P' ismentioned and since Q' below.) By (II), Q equivalent, are P and P' equivalent. (3) The

line segment QQ' is parallel with the segment PP', since the triangles RQQ' and RPP' are similar and have the common vertex R. c

(4) Let

be

cP+{l-c)P'.

0<c<l, any number satisfying P and P" are equivalent, by (II). case (II*) of (II) applies here.)

and

let P"

(The special (5) Since c is arbitrary, it follows that all points of the line segment PP' are equivalent. are all points of the segment QQ' Similarly equivalent. (6) Since k is arbitrary, it follows that the triangle RPP' is covered family of line segments each parallel to PP', each of which

by a is an

class.

equivalence

of the unit square, the square is (7) As R sweeps out the circumference covered by such triangles; and each triangle is again covered by to PP', each segment consisting of equivalent segments parallel points. (The case of R collinear be special.) (8) The

union

with PP'

is seen at this point not to

with QQ' is a single of since the square; perimeter points equivalence this interval is an equivalence set. Similarly for other

of all such

segments

collinear

between

segment is transitive,

Thus the unit square is partitioned into a to each of line PP'. sets, segment parallel consisting equivalence This completes the proof of the Lemma.f segments

University

mentioned.

College,

London

the present of paper were ready only after the death were checked proofs kindly by the staff of The City and the University It was found that the bibliography London London. University, College, was incomplete, and even though several corrections and additions were made, there still t Editors'

Note.

Professor

Allan

remain

gaps

The proofs of Birnbaum. The

in the bibliographical

data.


THE

NEYMAN-PEARSON

THEORY

45

NOTES * of parts of this material of earlier versions for helpful discussions is grateful The writer E. S. Pearson, J. Pratt, C. A. B. Smith, A. D. V. Lindley, with many colleagues, particularly P. Dawid, G. Robinson, B. Norton, and M. Stone. 1 in the its appearance, linked with the term 'decide', The term 'rule of behavior' made of the problem in the discussion the formulation of testing 1933 paper, introducing the concept of p. 142, reprint). (p. 291, original; Subsequently hypotheses was elaborated to various other concepts in opposition and supported, behavior' of statistical inference 1957, 1962, 1971). (1947, (inductive by Neyman reasoning'), 2 who are also prominent theoretical the decision statisticians, concept Among geneticists statistical

'inductive

in scientific data has been rejected as inappropriate (at least in its behavioral interpretation) in statistical from different analysis, standpoints theory, by: in a non from the standpoint of standard methods 1. O. Kempthorne, interpreted below behavioral 1971, pp. 471-3, (for example, 489); way similar to that discussed a version of Bayesian 2. C. A. B. Smith, who has developed theory, and has led in the use of in genetics in scientific publications methods (1959, p. 297); Bayesian an exponent of the likelihood who has applied that 3. A. W. F. Edwards, approach, in genetics in his scientific publications (1972); and approach 4. R. A. Fisher 1956, pp. 100-103). (for example, a for problems of testing linkage, where is unrealistic The case of two simple hypotheses to is of the statistical scientific represent generally composite adopted hypothesis hypothesis of two simple hypotheses entails no sacrifice of the simplified model linkage. However to the questions of interpretation in this paper. On the realism with respect considered use of simple tests in practice of linkage often make formulations contrary, typical a more to for technical realistic reasons, represent composite effectively hypothesis, 1953, pp. 180-183). 1955; Smith, (Morton, comments of the example of the Analogous apply to the limited realism of our discussion It turns out that the realistic composite lamp manufacturer: hypothesis representing good lot quality reasons, is, for technical (at most 4% defective) represented effectively by the in the sense that the value a characterizing any simple hypothesis (exactly 4% defective), decision function for the simplified is also an upper bound of error ('admissible') problem

hypothesis

over the realistic probabilities alternative hypothesis. 3 The essential point epitomized

composite

hypothesis.

comments

Similar

apply

to

the

here is that there is a distinction of levels of language, the in the 'object language' of things and behavioral acts, the second in the occurring in which we discuss a certain statement 'metalanguage' Apparent (hypothesis). exceptions to the epitomization terms. For example, in the preceding in a scientific require explanation context research 'to decide that a certain hypothesis is supported is by strong evidence'

first phrase

tantamount

to 'to decide

to make

the statement

that the hypothesis

is supported

by strong

evidence.' occurrence The apparently here of 'decide to' with an evidential reference is exceptional occurs here in the metalanguage (where explained by pointing out that 'tomake a statement' are expressed), all evidential considerations and so is not a case of 'to act' when that phrase occurs in the object it has behavioral language, where interpretations. 4 of these aspects of simple genetics of joint consideration research problems will Examples in Smith (1968) and Mendel be found, for example, (1866). The present writer will offer an extended study

discussion

in the structure

of such

considerations

of science.'

in another

paper,

'Mendelian

genetics:

a case


ALLAN

46

BIRNBAUM

5

a behavioral of 'decisions' the where Even in applications interpretation clearly applies, has had a slow and of decision formal models of complete scope of applications problems limited development Brown, 1970); possibly due in part to considerations (see, for example, above. discussed 6 of the error probability of testing problems the counterpart formulation In the traditional a was the 'probability theoretical level' statistic P = P(x). The aspect of the traditional with that statistic, under which of statistical evidence associated is a concept formulation as an index of strength of evidence the hypothesis is interpreted Hx, with P(x) against is traditional evidence. Thus the smaller values of P{x) stronger interpretation indicating was an and not behavioral evidential (in any direct sense), and the behavioral interpretation of the Neyman-Pearson innovation theory. In many dichotomy Here 0.05

in terms of a the statistic P(x) was (and is) interpreted schematically, applications if if is and evidence such as: the statistical F(jc)^0.05. strong only against H1 a in our schema; and the schematized to the error probability form corresponds

can be represented takes function which formulation by a formal decision if and only if the observed the value sample point x gives P(x)^0.05. 'reject H{ 7 of certain relative that there is any behavioral is not to deny This (literal) realization a in the schema the error probabilities of errors, approximating representing frequencies or same of tests of form. of the in certain series test, conceivable) (actual applications long in a somewhat is related is that such a behavioral What is suggested abstract, interpretation or to of a single the evidential indirect theoretical) way interpretation (hypothetical of the traditional

relation of the evidential situation. This theoretical of a test in a given research application to a certain behavioral of the in such an application, of a 'decision' interpretation meaning same formal context does not reduce or in another 'decision' (a series of applications), ones. On the contrary, in favor of behavioral apprecia interpretations of the hypothetical with appreciation a behavioral coupled interpretation, in the given research it bears to an evidential theoretical relation situation, interpretation as an important part of appreciation of the meaning of statistical evidence may be regarded evidential

eliminate tion of

such

as interpreted under the confidence concept. 8 of statistical The likelihood 1972) is based on a primitive (Edwards, concept approach our to of the formulation confidence evidence which (Conf) appears analogous closely the kind of theoretical does not satisfy the latter nor provide but which nevertheless concept, in It was rejected by Neyman and Pearson mentioned above. of error probabilities in their 1933 paper, after they had used it as the basis of their favor of the confidence concept the two of incompatibilities between 1928 paper. A detailed discussion exploratory

control

concepts The {L')\

is given likelihood

in Birnbaum concept

may

(1969). be formulated

thus:

If an observed

sample point has very small probability then it provides to its probability (density) under H2, H2 as against H\.

relative (density) under Hu for statistical evidence

strong

were and taken up successively concepts by Neyman to the simpler primitive concept of statistical evidence which which has formulation, (usually implicitly) with tests in their traditional as since 1710. Both in applications been represented (Conf) and (L') may be considered thus: that traditional in analogous ways, concept, which may be formulated assimilating, The

likelihood

and

as plausible has been associated

(P): A

concept

against Hi

confidence

successors

Pearson

of

statistical

with

very

evidence

is not

small probability

plausible when Hi

unless is true.

it finds

'strong

evidence


THE

NEYMAN-PEARSON

THEORY

47

In traditional

this concept had been complemented by unformalized practice judgement as in the devising and selection of test statistics, which were then interpreted indices of strength of statistical evidence against a hypothesis Hx, without explicit reference to alternative hypotheses.

exercized

of the concepts of evidence mentioned may be regarded as a refined version of that moves familiar intuitive seems which observed us, when concept something or 'unlikely' toward reconsidera (in any sense, often not specified explicitly), 'improbable' Each

simpler

tion of some hypothesis, perhaps only tacitly held. 9 The reader is urged to compare this discussion with cited in Section 8. by Savage and Lindley

the original

versions

of the argument

BIBLIOGRAPHY Barndorff-Nielsen, Aarhus.

O.,

On

1971,

Statistical

Conditional

An Omnibus 'Confidence Curves: 1961, A., Statistical Hypotheses', Journal of theAmerican Testing 246-249.

Inference

(mimeographed),

for Estimation and Technique Statistical Association 56 (1961),

Birnbaum,

of Statistical Journal of the American Inference', Birnbaum, A., 1962, 'On the Foundations Statistical Association 57 (1962), 269-326 (with discussion). inPhilosophy A., 1969, 'Concepts of Statistical Evidence', Science, and Method: Birnbaum, and in Honor Patrick (ed. by Sidney Morgenbesser, Essays Suppes, of Ernest Nagel Morton White), St. Martin's Press, New York. Birnbaum, American

A., 1970, Statistical

Birnbaum,

A.,

'On Durbin's Association

1972a,

Modified 65

'The Random

of Conditionality',

Principle 402-403.

(1970), Phenotype

Journal

with Applications',

Concept,

of the 72

Genetics

(1972), 739-758. A., Birnbaum, 1972b, Statistical Association A.

Birnbaum, Formula',

and Maxwell, Statistics

Applied R. B.,

Braithwaite, R. V.,

Brown,

Review, Buehler, matical

'More on Concepts of Statistical 67 (1972), 858-861. A.

'Classification E., 1960, 152-159. 9 (1960),

D.

Procedures

1954, Scientific Explanation, Cambridge University 'Do Managers Find Decision Useful?', Theory

May-June. R. J., 1959, Statistics

R.,

Journal

1970,

30

'Some Validity Criteria (1959), 845-863.

R. J. and Fedderson, A. P., 1963, Buehler, Ann. Math. Statist. 34 (1963), 1098-1100. Cox,

Evidence',

1958,

'Some

Problems

for Statistical

Inference',

'Note on a Conditional

Connected

with

Statistical

of the American Based

on Bayes

Press. Harvard Annals

Property

Business of Mathe

of Student's

Inference',

Annals

t',

of

Mathematical Statistics 29 (1958), 357-372. Cox,

D. R.,

1971,

'The Choice

Between

Alternative

Ancillary

Statistics',

Journal

of the

Royal Statistical Society 33 (B) (1971), 251-255. A.

Dempster, Index

P. and Schatzoff, M., Journal Statistics',

for Test

1965, 'Expected the American

of

as a Sensitivity Level Significance Statistical Association 60 (1965),

420-436. Durbin,

'On Birnbaum's Theorem of the Relation Between J., 1970, Sufficiency, and Likelihood, Journal Statistical Association 65 of the American (followed notes). by two discussion

tionality, 395-398

Condi (1970),


A. W.

Edwards,

R. A.,

Fisher,

Press. 1972, Likelihood, Cambridge University and Scientific Statistical Methods Inference, Oliver Boyd, The Logic of Statistical University Inference, Cambridge

F., 1956,

Ian,

Hacking,

BIRNBAUM

ALLAN

48

1965,

Edinburgh. Press.

3rd ed., Oxford University Press, London. H., 1961, Theory of Probability, A Review', for Industrial D. V., 1971, and Applied Statistics, 'Bayesian Society

Jeffreys, Lindley,

Mathematics,

Philadelphia.

Press. 1971, The Matter of Chance, Cambridge University der Naturforschenden G., 1866, 'Versuche ?ber Pflanzenhybriden', Mendel, Verhandlungen 4 (1865), 3-44. inExperiments in Brunn in Plant Hybridiza Vereins translation (English Mellor,

Hugh,

and Boyd. tion, ed. by J. H. Bennett, 1965, Oliver N. E., 1955, 'Sequential Tests for the Detection Morton, Human Genetics 7, (1955), 277-318. Nagel,

Ernest,

Neyman,

J.,

Washington

The Structure

1961, 1938, D.C.

(2nd ed., Graduate

1952),

of Linkage',

American

New York. Harcourt-Brace, on Mathematical and Conferences of Agriculture. Department

of Science, Lectures

School, U.S. 'Raisonnement inductif

ou comportement

J., 1947, Neyman, Statistical International

inductif,

3, 423-433. Conference as a Basic Concept of Philosophy J., 1957, 'Inductive Behavior Neyman, Institute 25 (1957), 7-22. Statistical the International of in the Theory of Statistical 'Two Breakthroughs J., 1962, Neyman, Review Neyman, Criteria Neyman, Statistical

Journal

of

Statistics,

Proceedings of Science',

of the Review

Decision-Making', 11-27. Statistical Institute 30 (1962), of the International of Certain E. S., 1928, 'On the Use and Interpretation Test J. and Pearson, Part I', Biometrika 20A (1928), for Purposes of Statistical 175-240. Inference, E. S., 1933, of the Most Tests of 'On the Problem Efficient J. and Pearson, Transactions

Philosophical

Hypotheses',

the Royal

of

of London

Society

231 (A), 289-337 (pp. 140-185 in 1967 reprinting). Neyman,

J. and Pearson, Statistical

to the Theory of Testing 'Contributions S., 1936, Research Memoirs vol. I. pp. 113-137 (pp. 203-239

E.

Hypothesis',

reprinting.) J. and Pearson, E. S., 1967, Joint Statistical Papers, Neyman, E. S., 1966, The Selected Papers of E. S. Pearson, Pearson, California. Berkeley, Renwick, (1971),

J., 1971, 81-120.

G. K., Robinson, Behrens-Fisher

'The Mapping 1974, Solution

of Human

'Conditional

University Annual

Confidence

to the Two Means

of

Properties

Problem',

in 1967

Press. University of California Press,

Cambridge

Chromosomes',

Statistical

Review Student's

of Genetics t and

of

5 the

unpublished. New Wiley,

York. Leonard J., 1954, The Foundations of Statistics, inRecent Developments in Information and J., 1962, 'Bayesian Statistics', Savage, Leonard N.Y. and London. and P. Gray), Macmillan, Decision Processes (ed. by R. E. Machol in Human Genetics', of Linkage 'The Detection Journal of the Smith, Cedric A. B., 1953,

Savage,

Royal Statistical Society 15 (B) (1953), 155-192. Cedric A. B., 1959, American Investigations', Smith, Cedric A. B., 1965, Smith,

'Some Comments Journal 'Personal

of Human Probability

on

the Statistical

Genetics

11

and Statistical

Methods

(1959),

Used

in Linkage

289-403.

Analysis,

with Discussion',

Journal of theRoyal Statistical Society 128 (A) (1965), 469-499. Smith, Cedric Generation

A.

B.,

Families',

in Simple Twoand Corrections 1968, 'Linkage Scores 33 (1968), 127-150. Genetics Annals of Human

and Three


THE

Stone, M.,

1969,

'The Role

THEORY

NEYMAN-PEARSON

of Significance

Testing:

Some Data

with

49

aMessage',

Biometrika

56 (1969), 485-493. v. Decisions', J. W., 'Conclusions 1960, Tukey, Functions, Wald, A., 1950, Statistical Decision

Technometrics Wiley,

New

2 (1969), York.

423-433.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.