Mathematical Statistics 1st Edition Dieter Rasch
University of Natural Resources and Life Sciences, Institute of Applied Statistics and Computing Vienna, Austria
Faculty of Engineering, Hochschule Wismar, University of Applied Sciences: Technology, Business and Design Wismar, Germany
Preface xiii
1BasicIdeasofMathematicalStatistics 1
1.1StatisticalPopulationandSamples 2
1.1.1ConcreteSamplesandStatisticalPopulations 2
1.1.2SamplingProcedures 4
1.2MathematicalModelsforPopulationandSample 8
1.3SufficiencyandCompleteness 9
1.4TheNotionofInformationinStatistics 20
1.5StatisticalDecisionTheory 28
1.6Exercises 32 References 37
2PointEstimation 39
2.1OptimalUnbiasedEstimators 41
2.2Variance-InvariantEstimation 53
2.3MethodsforConstructionandImprovementofEstimators 57
2.3.1MaximumLikelihoodMethod 57
2.3.2LeastSquaresMethod 60
2.3.3MinimumChi-SquaredMethod 61
2.3.4MethodofMoments 62
2.3.5JackknifeEstimators 63
2.3.6EstimatorsBasedonOrderStatistics 64 64 L-Estimators 66 M-Estimators 67 68
2.4PropertiesofEstimators 68
2.4.1SmallSamples 69
2.4.2AsymptoticProperties 71
2.5Exercises 75 References 78
3StatisticalTestsandConfidenceEstimations 79
3.1BasicIdeasofTestTheory 79
3.2TheNeyman–PearsonLemma 87
3.3TestsforCompositeAlternativeHypothesesandOne-Parametric DistributionFamilies 96
3.3.1DistributionswithMonotoneLikelihoodRatioandUniformlyMost PowerfulTestsforOne-SidedHypotheses 96
3.3.2UMPU-TestsforTwo-SidedAlternativeHypotheses 105
3.4TestsforMulti-ParametricDistributionFamilies 110
3.4.1GeneralTheory 111
3.4.2TheTwo-SampleProblem:PropertiesofVariousTestsand Robustness 124 125
3.4.3ComparisonofTwoVariances 137
3.4.4TableforSampleSizes 138
3.5ConfidenceEstimation 139
3.5.1One-SidedConfidenceIntervalsinOne-ParametricDistribution Families 140
3.5.2Two-SidedConfidenceIntervalsinOne-ParametricandConfidence IntervalsinMulti-ParametricDistributionFamilies 143
3.5.3TableforSampleSizes 146
3.6SequentialTests 147
3.6.1Introduction 147
3.6.2Wald’sSequentialLikelihoodRatioTestforOne-Parametric ExponentialFamilies 149
3.6.3TestaboutMeanValuesforUnknownVariances 153
3.6.4ApproximateTestsfortheTwo-SampleProblem 158
3.6.5SequentialTriangularTests 160
3.6.6ASequentialTriangularTestfortheCorrelationCoefficient 162
3.7RemarksaboutInterpretation 169
3.8Exercises 170 References 176
4LinearModels – GeneralTheory 179
4.1LinearModelswithFixedEffects 179
4.1.1LeastSquaresMethod 180
4.1.2MaximumLikelihoodMethod 184
4.1.3TestsofHypotheses 185
4.1.4ConstructionofConfidenceRegions 190
4.1.5SpecialLinearModels 191
4.1.6TheGeneralisedLeastSquaresMethod(GLSM) 198
4.2LinearModelswithRandomEffects:MixedModels 199
4.2.1BestLinearUnbiasedPrediction(BLUP) 200
4.2.2EstimationofVarianceComponents 202
4.3Exercises 203
References 204
5AnalysisofVariance(ANOVA) – FixedEffectsModels(ModelIof AnalysisofVariance) 207
5.1Introduction 207
5.2AnalysisofVariancewithOneFactor(Simple-orOne-WayAnalysis ofVariance) 215
5.2.1TheModelandtheAnalysis 215
5.2.2PlanningtheSizeofanExperiment 228 228 231
5.3Two-WayAnalysisofVariance 232
5.3.1Cross-Classification(A × B) 233 236 244
5.3.2NestedClassification(A B) 260
5.4Three-WayClassification 272
5.4.1CompleteCross-Classification(A × B × C ) 272
5.4.2NestedClassification(C ≺ B ≺ A) 279
5.4.3MixedClassification 282 SubordinatedtoaThirdFactor B ≺ A × C 282 Nested C ≺ A × B 288
5.5Exercises 291 References 291
6AnalysisofVariance:EstimationofVarianceComponents (ModelIIoftheAnalysisofVariance) 293
6.1Introduction:LinearModelswithRandomEffects 293
6.2One-WayClassification 297
6.2.1EstimationofVarianceComponents 300 300 Y302 304 305 306
6.2.2TestsofHypothesesandConfidenceIntervals 308
6.2.3VariancesandPropertiesoftheEstimatorsoftheVariance Components 310
6.3EstimatorsofVarianceComponentsintheTwo-Wayand Three-WayClassification 315
6.3.1GeneralDescriptionforEqualandUnequalSubclassNumbers 315
6.3.2Two-WayCross-Classification 319
6.3.3Two-WayNestedClassification 324
6.3.4Three-WayCross-ClassificationwithEqual SubclassNumbers 326
6.3.5Three-WayNestedClassification 333
6.3.6Three-WayMixedClassification 335
6.4PlanningExperiments 336
6.5Exercises 338 References 339
7AnalysisofVariance – ModelswithFiniteLevelPopulations andMixedModels 341
7.1Introduction:ModelswithFiniteLevelPopulations 341
7.2RulesfortheDerivationof SS, df, MS and E(MS)inBalanced ANOVAModels 343
7.3VarianceComponentEstimatorsinMixedModels 348
7.3.1AnExamplefortheBalancedCase 349
7.3.2TheUnbalancedCase 351
7.4TestsforFixedEffectsandVarianceComponents 353
7.5VarianceComponentEstimationandTestsofHypothesesin SpecialMixedModels 354
7.5.1Two-WayCross-Classification 355
7.5.2Two-WayNestedClassification B ≺ A358 A Random 360 B Random 361
7.5.3Three-WayCross-Classification 362
7.5.4Three-WayNestedClassification 365
7.5.5Three-WayMixedClassification 368 ≺ A)× C368 C ≺ AB371
7.6Exercises 374 References 374
8RegressionAnalysis – LinearModelswithNon-randomRegressors (ModelIofRegressionAnalysis)andwithRandomRegressors (ModelIIofRegressionAnalysis) 377
8.1Introduction 377
8.2ParameterEstimation 380
8.2.1LeastSquaresMethod 380
8.2.2OptimalExperimentalDesign 394
8.3TestingHypotheses 397
8.4ConfidenceRegions 406
8.5ModelswithRandomRegressors 410
8.5.1Analysis 410
8.5.2ExperimentalDesigns 415
8.6MixedModels 416
8.7ConcludingRemarksaboutModelsofRegressionAnalysis 417
8.8Exercises 419 References 419
9RegressionAnalysis – IntrinsicallyNon-linearModelI 421
9.1EstimatingbytheLeastSquaresMethod 424
9.1.1Gauss–NewtonMethod 425
9.1.2InternalRegression 431
9.1.3DeterminingInitialValuesforIterationMethods 433
9.2GeometricalProperties 434
9.2.1ExpectationSurfaceandTangentPlane 434
9.2.2CurvatureMeasures 440
9.3AsymptoticPropertiesandtheBiasofLSEstimators 443
9.4ConfidenceEstimationsandTests 447
9.4.1Introduction 447
9.4.2TestsandConfidenceEstimationsBasedontheAsymptotic CovarianceMatrix 451
9.4.3SimulationExperimentstoCheckAsymptoticTestsand ConfidenceEstimations 452
9.5OptimalExperimentalDesign 454
9.6SpecialRegressionFunctions 458
9.6.1ExponentialRegression 458 458 460 463 466
9.6.2TheBertalanffyFunction 468
9.6.3TheLogistic(Three-ParametricHyperbolicTangent)Function 473
9.6.4TheGompertzFunction 476
9.6.5TheHyperbolicTangentFunctionwithFourParameters 480
9.6.6TheArcTangentFunctionwithFourParameters 484
9.6.7TheRichardsFunction 487
9.6.8SummarisingtheResultsofSections9.6.1–9.6.7 487
9.6.9ProblemsofModelChoice 488
9.7Exercises 489 References 490
10AnalysisofCovariance(ANCOVA) 495
10.1Introduction 495
10.2GeneralModelI–IoftheAnalysisofCovariance 496
10.3SpecialModelsoftheAnalysisofCovariancefortheSimple Classification 503
10.3.1OneCovariablewithConstant γ 504
10.3.2ACovariablewithRegressionCoefficients γ i DependingontheLevels oftheClassificationFactor 506
10.3.3ANumericalExample 507
10.4Exercises 510 References 511
11MultipleDecisionProblems 513
11.1SelectionProcedures 514
11.1.1BasicIdeas 514
11.1.2IndifferenceZoneFormulationforExpectations 516 517 t =1 529
11.1.3SelectionofaSubsetContainingtheBestPopulationwithGiven Probability 531 Expectation 534 534
11.2MultipleComparisons 539
11.2.1ConfidenceIntervalsforAllContrasts:Scheffé’sMethod 542
11.2.2ConfidenceIntervalsforGivenContrasts:Bonferroni’sandDunn’ s Method 548
11.2.3ConfidenceIntervalsforAllContrastsfor ni = n:Tukey’ s Method 550
11.2.4ConfidenceIntervalsforAllContrasts:GeneralisedTukey’ s Method 553
11.2.5ConfidenceIntervalsfortheDifferencesofTreatmentswithaControl: Dunnett’sMethod 554
11.2.6MultipleComparisonsandConfidenceIntervals 556
11.2.7WhichMultipleComparisonShallBeUsed? 559
11.3ANumericalExample 559
11.4Exercises 563 References 563
12ExperimentalDesigns 567
12.1Introduction 568
12.2BlockDesigns 571
12.2.1CompletelyBalancedIncompleteBlockDesigns(BIBD) 574
12.2.2ConstructionMethodsofBIBD 582
12.2.3PartiallyBalancedIncompleteBlockDesigns 596
12.3Row–ColumnDesigns 600
12.4FactorialDesigns 603
12.5ProgramsforConstructionofExperimentalDesigns 604
12.6Exercises 604 References 605
AppendixA:Symbolism 609
AppendixB:Abbreviations 611
AppendixC:ProbabilityandDensityFunctions 613
AppendixD:Tables 615
SolutionsandHintsforExercises 627 Index 659
‘Mathematicalstatistics’ neverlostitsattractiveness,bothasamathematical disciplineandforitsapplicationsinnearlyallpartsofempiricalresearch. Duringthelastyearsitwasfoundthatnoteverythingthatismathematically optimalisalsopracticallyrecommendableifwearenotsurewhetherthe assumptions(forinstance,normality)arevalid.
Asanexampleweconsiderthetwo-sample t-testthatisanoptimal (uniformlymostpowerfulunbiased)testifallassumptionsarefulfilled.Inapplicationshowever,weareoftennotsurethatbothvariancesareequal.Thenthe approximateWelchtestispreferable.Suchresultshavebeenfoundbyextensive simulationexperimentsthatplayedamuchgreaterrolethelasttime(seethe eightinternationalconferencesaboutthistopicsince1994underhttp://iws.
Thereforewewrotein2016anewbookinGerman(RaschandSchott,2016) basedonRasch(1995)incorporatingthedevelopmentsofthelastyears. Wedroppedthefirstpartofthebookfrom1995containingmeasureand probabilitytheorybecausewehaveexcellentbooksaboutthissuchasBillingsley (2012)andKallenberg(2002).
Consideringthepositiveresonancetothisbookinthecommunityofstatistics,wedecidedtopresentanEnglishversionofourbookfrom2016.Wethank AlisonOliverforthereceptionintoWiley’spublishingprogramme.
Weassumefromprobabilitytheoryknowledgeaboutexponentialfamilies aswellascentralandnon-central t-, χ 2 -and F-distributions.Becausethedefinitionofexponentialfamiliesisbasicforsomechapters,itisrepeatedin thisbook.
Mostoftheauthorsofbooksaboutmathematicalstatisticsassumethatdata alreadyexistandmustbeanalysed.Butwethinkthattheoptimaldesignforcollectingdataisatleastasimportantasthestatisticalanalysis.Therefore,inadditiontostatisticalanalysis,weincludedthedesignofexperiments.Theoptimal allocationisdescribedinthechaptersonregressionanalysis.Finallyachapter aboutexperimentaldesignsisadded.
Forpracticalcalculationsofdata,wepresentanduseinsomepartsofthebook IBMSPSSStatistics24 forthestatisticalanalysis,andwethankDr.Johannes Gladitz(Berlin)forgivingusaccesstoit.Unfortunately,itisnotpossibleto changewithinSPSStoBritishEnglish – therefore,youfindinthescreens andinourcommand ‘Analyze’ .
Thedeterminationofsamplesizescanbefoundtogetherwiththedescription ofthemethodofanalysis,andforthesamplesizedeterminationandother designproblems,weofferthepackageOPDOE(OptimalDesignofExperiments)under .
WeheartilythankProf.Dr.RobVerdooren(Wageningen,Netherlands)for provingthecorrectnessofstatisticsandSandraAlmgren(Kremmling,CO, USA)forimprovingtheEnglishtext.
Billingsley,P.(2012) ProbabilityandMeasure,JohnWiley&Sons,Inc.,NewYork. Kallenberg,O.(2002) FoundationsofModernProbability,2ndedition,Springer, NewYork.
Rasch,D.(1995) MathematischeStatistik,JohannAmbrosiusBarth,Berlin, Heidelberg.
Rasch,D.andSchott,D.(2016) MathematischeStatistik,WileyVCH,Weinheim.
Elementarystatisticalcomputationshavebeencarriedoutforthousandsof years.Forexample,thearithmeticmeanfromanumberofmeasuresorobservationdatahasbeenknownforaverylongtime.
Firstdescriptivestatisticsarosestartingwiththecollectionofdata,forexample,atthenationalcensusorinregistersofmedicalcards,andfollowedbycompressionofthesedataintheformofstatisticsorgraphicalrepresentations (figures).Mathematicalstatisticsdevelopedonthefundamentofprobability theoryfromtheendof19thcenturyon.Atthebeginningofthe20thcentury, KarlPearsonandSirRonaldAylmerFisherwerenotablepioneersofthisnew discipline.Fisher’sbook(1925)wasamilestoneprovidingexperimenterssuch basicconceptsashiswell-knownmaximumlikelihoodmethodandanalysisof varianceaswellasnotionsofsufficiencyandefficiency.AnimportantinformationmeasureisstillcalledtheFisherinformation(seeSection1.4).
Concerninghistoricaldevelopmentwedonotwanttogointodetail.Werefer interestedreaderstoStigler(1986,1990).Insteadwewilldescribetheactual stateofthetheory.Neverthelessmanystimulicomefromrealapplications. Hence,fromtimetotimewewillincluderealexamples.
Althoughtheprobabilitycalculusisthefundamentofmathematicalstatistics, manypracticalproblemscontainingstatementsaboutrandomvariablescannot besolvedwiththiscalculusalone.Forexample,weoftenlookforstatements aboutparametersofdistributionfunctionsalthoughwedonotpartlyorcompletelyknowthesefunctions.Mathematicalstatisticsisconsideredinmanyintroductorytextbooksasthetheoryofanalysingexperimentsorsamples;thatis,itis assumedthatarandomsample(correspondingtoSection1.1)isgiven.Oftenitis notconsideredhowtogetsucharandomsampleinanoptimalway.Thisistreatedlaterindesignofexperiments.Butinconcreteapplications,theexperiment firsthastobeplanned,andaftertheexperimentisfinished,theanalysishastobe carriedout.Butintheoryitisappropriatetodeterminefirstlytheoptimalevaluation,forexample,thesmallestsamplesizeforavarianceoptimalestimator. Henceweproceedinsuchawayandstartwiththeoptimalevaluation,andafter
Mathematicalstatisticsinvolvesmainlythetheoryofpointestimation,statisticalselectiontheory,thetheoryofhypothesistestingandthetheoryofconfidenceestimation.Intheseareastheoremsareproved,showingwhich proceduresarethebestonesunderspecialassumptions.
Wewishtomakeclearthatthetreatmentofmathematicalstatisticsonthe onehandanditsapplicationtoconcretedatamaterialontheotherhandare totallydifferentconcepts.Althoughthesametermsoftenoccur,theyneed notbeconfused.Strictlyspeaking,thenotionsoftheempiricalsphere(hence oftherealworld)arerelatedtocorrespondingmodelsintheory.
Ifassumptionsforderivingbestmethodsarenotfulfilledinpracticalapplications,thequestionariseshowgoodthesebestmethodsstillare.Suchquestions areansweredbyapartofempiricalstatistics – bysimulations.Weoftenfind thattheassumptionofanormaldistributionoccurringinmanytheoremsis farfrombeingagoodmodelformanydatainapplications.Inthelastyearssimulationdevelopedintoitsownbranchinmathematics.Thisshowsaseriesof internationalworkshopsonsimulation.Thefirsttosixthworkshopstookplace inSt.Petersburg(Russia)in1994,1996,1998,2001,2005and2009.Theseventh internationalworkshoponsimulationtookplaceinRimini(Italy)in2013and theeighthoneinVienna(Austria)in2015.
Becausethestrengthofassumptionshasconsequencesmainlyinhypothesis testingandconfidenceestimation,wediscusssuchproblemsfirstinChapter3, whereweintroducetheconceptofrobustnessagainstthestrengthof assumptions.
Intheempiricalsciences,onecharacterorseveralcharacterssimultaneously (charactervector)areobservedincertainobjects(orindividuals)ofapopulation.Themaintaskistoconcludefromthesampleofobservedvaluestothe wholesetofcharactervaluesofallobjectsofthispopulation.Theproblemis thatthereareobjectiveoreconomicalpointsofviewthatdonotadmitthecompletesurveyofallcharactervaluesinthepopulation.Wegivesomeexamples:
• Thecoststoregisterallcharactervalueswereoutofallproportiontothevalue ofthestatement(forinstance,measuringtheheightofallpeopleworldwide olderthan18years).
• Theregistrationofcharactervaluesresultsindestructionoftheobjects (destructivematerialstestingsuchasresistancetotearingofropesor stockings).
• Thesetofobjectsisofhypotheticnature,forexample,becausetheypartlydo notexistatthemomentofinvestigation(asallproductsofamachine).
Wecanneglectthefewpracticalcaseswhereallobjectsofapopulationcanbe observedandnomoreextensivepopulationisdemanded,becauseforthem mathematicalstatisticsisnotneeded.Thereforeweassumethatacertainpart (subset)ischosenfromthepopulationtoobserveacharacter(orcharactervector)fromwhichwewanttodrawconclusionstothewholepopulation.Wecall suchaparta(concrete)sample(oftheobjects).Thesetofcharactervalues measuredfortheseobjectsissaidtobea(concrete)sampleofthecharacter values.Eachobjectofthepopulationistopossesssuchacharactervalue(independentofwhetherweregisterthevalueornot).Thesetofcharactervaluesof allobjectsinthepopulationiscalledthecorrespondingstatisticalpopulation.
Aconcretepopulationaswellasthe(sought-after/relevant)characterand thereforealsothecorrespondingstatisticalpopulationneedtobedetermined uniquely.Populationshavetobecircumscribedinthefirstlineinrelationto spaceandtime.Inprincipleitmustbeclearforanarbitraryrealobjectwhether itbelongstothepopulationornot.Inthefollowingweconsidersomeexamples:
A Heiferofacertainbreedinacertain regioninacertainyear
B Inhabitantsofatownatacertainday
A2 Bodymassoftheseheiferafter 180days
A3 Backheightoftheseheifer
B1 Bloodpressureofthese inhabitantsat6.00o’clock
B2 Ageoftheseinhabitants
Itisclearthatapplyingconclusionsfromasampletothewholepopulation canbewrong.Forexample,ifthechildrenofadaynurseryarechosenfrom thepopulation B inthetableabove,thenpossiblythebloodpressure B1 but withoutdoubttheage B2 arenotapplicableto B.Generallywespeakofcharacters,butiftheycanhaveacertaininfluencetotheexperimentalresults,they arealsocalledfactors.The(mostlyonlyafew)charactervaluesaresaidtobe factorlevels,andthecombinationsoffactorlevelsofseveralfactorsfactorlevel combinations.
Thesampleshouldberepresentativewithrespecttoallfactorsthatcaninfluencethecharacterofastatisticalpopulation.Thatmeansthecompositionofthe populationshouldbemirroredinthesampleofobjects.Butthatisimpossible forsmallsamplesandmanyfactorlevelcombinations.Forexample,thereare alreadyabout200factorlevelcombinationsinpopulation B concerningthefactorsageandsex,whichcannotberepresentativelyfoundinasampleof100
inhabitants.Thereforewerecommendavoidingthenotionof ‘representative sample’ becauseitcannotbedefinedinacorrectway.
Samplesshouldnotbeassessedaccordingtotheelementsincludedbutaccordingtothewaytheseelementshavebeenselected.Thiswayofselectingasampleis calledsamplingprocedure.Itcanbeappliedeithertotheobjectsasstatistical unitsortothepopulationofcharactervalues(e.g.inadatabank).Inthelatter casethesampleofcharactervaluesarisesimmediately.Inthefirstcasethecharactermustbefirstregisteredattheselectedobjects.Bothprocedures(butnot necessarilythecreatedsamples)areequivalentifthecharactervalueisregistered foreachregisteredobject.Thisisassumedinthischapter.Itisnotthecaseinsocalledcensoredsampleswherethecharactervaluescouldnotberegisteredinall unitsoftheexperiment.Forexample,ifthedeterminationoflifespansofobjects (aselectroniccomponents)isfinishedatacertaintime,measuredvaluesof objectswithlongerlifespans(astimeofdetermination)aremissing.
Inthefollowingwedonotdifferbetweensamplesofobjectsandsamplesof charactervalues;thedefinitionsholdforboth.
Definition1.1 Asamplingprocedureisaruleofselectingapropersubset, namedsample,fromawell-definedfinitebasicsetofobjects(population,universe).Itissaidtobeatrandomifeachelementofthebasicsethasthesame probability p tocomeintothesample.A(concrete)sampleistheresultofasamplingprocedure.Samplesresultingfromarandomsamplingprocedurearesaid tobe(concrete)randomsamples.
Therearealotofrandomsamplingproceduresinthetheoryofsamples(see,e.g. CochranandBoing,1972;KauermannandKüchenhoff,2011;Quatember,2014) thatcanbeusedinpractice.Basicsetsofobjectsaremostlycalled(statistical) populationsorsynonymouslysometimes(statistical)universesinthefollowing.
• Thesimplesampling,whereeachelementofthepopulationhasthesame probabilitytocomeintothesample.
• Thestratifiedsampling,wherearandomsamplingisdonewithinthebeforedefined(disjoint)subclasses(strata)ofthepopulation.Thiskindofsampling isonlyatrandomasawholeifthesamplingprobabilitieswithintheclassesare chosenproportionallytothecardinalitiesoftheclasses.
• Theclustersampling,wherethepopulationisdividedagainintodisjointsubclasses(clusters),butthesamplingofobjectsisdonenotamongtheobjectsof thepopulationitselfbutamongtheclusters.Intheselectedclustersallobjects areregistered.Thiskindofselectionisoftenusedasareasamples.Itisonlyat
• Themultistagesampling,whereatleasttwostagesofsamplingaretaken.In thelattercasethepopulationisfirstlydecomposedintodisjointsubsets(primaryunits).Thenarandomsamplingisdoneinallprimaryunitstogetsecondaryunits.Amultistagesamplingisfavourableifthepopulationhasa hierarchicalstructure(,province,townsintheprovince).Itis atrandomcorrespondingtoDefinition1.1iftheprimaryunitscontainthe samenumberofsecondaryunits.
• The(constantly)sequentialsampling,wherethesamplesizeisnotfixedatthe beginningofthesamplingprocedure.Atfirstasmallsampleistakenandanalysed.Thenitisdecidedwhethertheobtainedinformationissufficient,for example,torejectortoacceptagivenhypothesis(seeChapter3),orifmore informationisneededbyselectingafurtherunit.
Botharandomsampling(procedure)andanarbitrarysampling(procedure) canresultinthesameconcretesample.Hencewecannotprovebyinspecting thesampleitselfwhetherthesampleisrandomlychosenornot.Wehaveto checkthesamplingprocedureused.
Forthepurerandomsampling,Definition1.1isapplieddirectly:eachobject inthepopulationofsize N isdrawnwiththesameprobability p.Thenumberof objectsinasampleiscalledsamplesize,mostlydenotedby n.
Themostimportantcaseofapurerandomsamplingoccursiftheobjectsdrawn fromapopulationarenotputback.Anexampleisalottery,where n numbersare drawnfrom N givennumbers(inGermanythewell-knownLottouses N =49and n =6).Usinganunconditionedsamplingofsize N,thenumber M = N n ofall possiblesubsetshavethesameprobability p = 1 M tocomeintothesample.
Asmentionedbefore,thesampleitselfisonlyatrandomifarandomsampling methodwasused.Butpersonsbecomeatoncesuspiciousifthesampleis extreme.Ifsomebodygetsthetopprizebuyingonelotof10.000possiblelots, thenthiscaseispossiblealthoughratherunlikely.Itcanhappenatrandom,orin otherwords,itcanbetheresultof(acorrect)randomsampling.Butifthispersongetsthetopprizebuyingonelotatthreeconsecutivelotteriesofthementionedkind,andifitturnsoutadditionallythatthepersonisthebrotherofthe lotseller,thendoubtsarejustifiedthattherewassomethingfishygoingon.We wouldrefusetoacceptsuchunlikelyeventsandwouldsupposethatsomething iswrong.Inourlotterycase,wewouldassumethattheselectionwasnotatrandomandthatcheatswereatwork.Nevertheless,thereisanextremelysmall possibilitythatthiseventisatrandom,namely, p =1/
Incidentally,thestrategyofstatisticaltestsinChapter3istorefusemodels (facts)underwhichobservedeventspossessaverysmallprobabilityandinstead toacceptmodelswheretheseeventshavealargerprobability.
Apuresamplingalsooccursiftherandomsampleisobtainedbyreplacingthe objectsimmediatelyafterdrawingandobservingthateachobjecthasthesame probabilitytocomeintothesampleusingthisprocedure.Hence,thepopulation alwayshasthesamenumberofobjectsbeforeanewobjectistaken.Thatisonly possibleiftheobservationofobjectsworkswithoutdestroyingorchanging them(exampleswherethatisimpossiblearetensilebreakingtests,medical examinationsofkilledanimals,fellingoftreesandharvestingoffood).Thediscussedmethodiscalledsimplerandomsamplingwithreplacement.
Ifapopulationof N objectsisgivenand n objectsareselected,thenitis n < N forsamplingwithoutreplacement,whileobjectsthatcanmultiplyoccurinthe sampleand n > N ispossibleforsamplingwithreplacement.
Amethodthatcansometimesberealisedmoreeasilyisthesystematicsamplingwithrandomstart.Itisapplicableiftheobjectsofthefinitesamplingset arenumberedfrom1to N andthesequenceisnotrelatedtothecharacterconsidered.Ifthequotient m = N/n isanaturalnumber,anaturalnumber i between 1and m ischosenatrandom,andthesampleiscollectedfromobjectswith numbers i , m + i ,2m + i , … ,(n – 1)m + i.Detailedinformationaboutthiscase andthecasewherethequotient m isnotnaturalcanbefoundinRaschetal. (2008)inMethod(1/31/1210).
Thestratifiedsamplingalreadymentionedisadvantageousifthepopulation ofsize N isdecomposedinacontent-relevantmannerinto s disjointsubpopulationsofsizes N1 , N2 , … , Ns.Ofcourse,thepopulationcansometimesbe dividedintosuchsubpopulationsfollowingthelevelsofasupposedinterfering factor.Thesubpopulationsaredenotedasstrata.Drawingasampleofsize n is torealiseinsuchapopulationanunrestrictedsamplingprocedureholdsthe dangerthatnotallstrataareconsideredingeneraloratleastnotinappropriate way.Thereforeinthiscaseastratifiedrandomsamplingprocedureisfavourable.Thenpartialsamplesofsize ni arecollectedfromthe ithstratum(i =1, 2, … , s)wherepurerandomsamplingproceduresareusedineachstratum. Thisleadstoarandomsamplingprocedureforthewholepopulationifthe numbers ni/n arechosenproportionaltothenumbers Ni/N.
Whileforthestratifiedrandomsamplingobjectsareselectedfromeachsubset,forthemultistagesampling,subsetsorobjectsareselectedatrandomat eachstageasdescribedbelow.Letthepopulationconsistof k disjointsubsets ofsize N0,theprimaryunits,inthetwo-stagecase.Further,itissupposedthat thecharactervaluesinthesingleprimaryunitsdifferonlyatrandom,sothat objectsneednottobeselectedfromallprimaryunits.Ifthewishedsamplesize is n = rn0 with r < k,then,inthefirststep, r ofthe k givenprimaryunitsare selectedusingapurerandomsamplingprocedure.Inthesecondstep n0 objects (secondaryunits)arechosenfromeachprimaryunitagainapplyingapurerandomsampling.Thenumberofpossiblesamplesis k r N0 n0 ,andeach
Table1.1 Possiblesamplesusingdifferentsamplingprocedures.
Sampling Number K ofpossiblesamples
Systematicsamplingwithrandomstart K =10
k =10
k =20, Ni =50, i= 1, …,20
k =10, Ni =100, i= 1, …,10
k =5, Ni =200, i= 1, …,5
k =2, N1 =400, N2 =600
k =20, N0 = 50, r= 4
k =20, N0 = 50, r= 5
k =10, N0 = 100, r= 2
k =10, N0 = 100, r= 4 Two-stagesampling
k =5, N0 = 200, r= 2
k =2, N0 = 500, r= 1
objectofthepopulationhasthesameprobability p = r k n0 N0 toreachthesample correspondingtoDefinition1.1.
Example1.1 Apopulationhas N =1000objects.Asampleofsize n =100 shouldbedrawnwithoutreplacementofobjects.Table1.1liststhenumber
ofpossiblesamplesusingthediscussedsamplingmethods.Theprobabilityof selectionforeachobjectis p =0.1.
Inmathematicalstatisticsnotionsaredefinedthatareusedasmodels(generalisations)forthecorrespondingempiricalnotions.Thepopulation,whichcorrespondstoafrequencydistributionofthecharactervalues,isrelatedtothe modelofprobabilitydistribution.Theconcretesampleselectedbyarandom procedureismodelledbytherealised(theoretical)randomsampling.These modelconceptsareadequate,ifthesize N ofthepopulationsisverylargecomparedwiththesize n ofthesample.
Definition1.2 An n-dimensionalrandomvariable
Y = y1 , y2 , …, yn T , n ≥ 1 withcomponents yi issaidtobearandomsample,if
• All yi havethesamedistributioncharacterisedbythedistributionfunction F(yi, θ )= F(y, θ )withtheparameter(vector) θ Ω Rp and
• All yi arestochasticallyindependentfromeachother,thatis,itholdsforthe distributionfunction F(Y,θ )of Y thefactorisation
FY , θ = n i =1 Fyi , θ , θ Ω R p
Thevalues Y =(y1, y2, … , yn)T ofarandomsample Y arecalledrealisations. Theset{Y}ofallpossiblerealisationsof Y iscalledsamplespace.
Inthisbooktherandomvariablesareprintedwithboldcharacters,andthe samplespace{Y}belongsalwaystoan n-dimensionalEuclidianspace,thatis, {Y} Rn. Thefunction
LY , θ = fY , θ = ∂ FY , θ ∂ Y ,forcontinuous y pY , θ ,fordiscrete y withtheprobabilityfunction p(Y, θ )andthedensityfunction f(Y, θ )correspondinglyissaidtobeforgiven Y asfunctionof θ thelikelihoodfunction(ofthe distribution).
• Randomsampleasrandomvariable Y correspondingtoDefinition1.2
• (Concrete)randomsampleassubclassofapopulation,whichwasselectedby arandomsampleprocedure.
Therealizations Y ofarandomsample Y wecallarealizedrandomsample. Therandomsample Y isthemathematicalmodelofthesimplerandomsampleprocedure,whereconcreterandomsampleandrealisedrandomsamplecorrespondtoeachotheralsointhesymbolism.
Wedescribeinthisbookthe ‘classical’ philosophy,where Y isdistributedbythe distributionfunction F(Y, θ )withthefixed(notrandom)parameter θ Ω Rp. BesidesthereisthephilosophyofBayeswherearandom θ issupposed,whichis distributedaprioriwithaparameter φ assumedtobeknown.Intheempirical Bayesianmethod,theaprioridistributionisestimatedfromthedatacollected.
Arandomvariableinvolvescertaininformationaboutthedistributionandtheir parameters.Mainlyforlarge n (say, n >100),itisusefultocondensetheobjects ofarandomsampleinsuchawaythatfewestpossiblenewrandomvariables containasmuchaspossibleofthisinformation.Thisvaguelyformulatedconceptistostatemorepreciselystepwiseuptothenotionofminimalsufficient statistic.First,werepeatthedefinitionofanexponentialfamily.
Thedistributionofarandomvariable y withparametervector θ = (θ 1,θ 2, …, θ p)T belongstoa k-parametricexponentialfamilyifitslikelihoodfunctioncan bewrittenas
fy, θ = hy e k i =1 ηi θ Ti y B θ , wherethefollowingconditionshold:
• ηi and B arerealfunctionsof θ and B doesnotdependon y.
• Thefunction h(y)isnon-negativeanddoesnotdependon θ .
Theexponentialfamilyisincanonicalformwiththeso-callednaturalparameters ηi,iftheirelementscanbewrittenas
fy, η = hy e k i =1 ηi Ti y A η with η = η1 , …, ηk T
Let(Pθ , θ ϵΩ)beafamilyofdistributionsofrandomvariables y withthedistributionfunction F(y, θ ), θ Ω.Therealisations Y =(y1, … , yn)T oftherandom sample
Y = y1 , y2 ,…, yn T ,
wherethecomponents yi aredistributedas y itselflieinthesamplespace{Y}. AccordingtoDefinition1.2thedistributionfunction F(Y, θ )ofarandomsample Y isjustas F(y, θ )uniquelydetermined.
Definition1.3 Ameasurablemapping M = M(Y)=[M1(Y), … , Mr(Y)]T , r ≤ n of{Y}onaspace{M},whichdoesnotdependon θ Ω,iscalledastatistic.
Definition1.4 Astatistic M issaidtobesufficientrelativetoadistribution family(Pθ , θ ϵΩ)orrelativeto θ Ω,respectively,iftheconditionaldistribution ofarandomsample Y isindependentof θ forgiven M = M(Y)= M(Y).
Example1.2 Letthecomponentsofarandomsample Y satisfyatwo-point distributionwiththevalues1and0.Further,letbe P(yi =1)= p and P(yi =0) =1 p ,0< p <1.Then M = M Y = n i =1 yi issufficientrelative(corresponding)to θ (0,1)= Ω.Toshowthis,wehavetoprovethat P Y = Y n i =1 yi = M isindependentof p.Nowitis
P Y = YM = P Y = Y , M = M P M = M , M =0,1, …, n
Weknowfromprobabilitytheorythat M = M Y = n i =1 yi isbinomiallydistributedwiththeparameters n and p.Henceitfollowsthat
P M = M = n M p M 1 p n M , M =0,1, …, n
Furtherwegetwith yi = 0or yi = 1and A(M)={Y M(Y)= M }theresult
P Y = Y , M Y = M = P y1 = y1 , …, yn = yn IAM Y = n i =1 1 yi p yi 1 p 1 yi IAM Y = p n i =1 yi 1 p n n i =1 yi IAM Y = p M 1 p n M
Consequentlyitis P Y = YM = 1 n M ,andthisisindependentof p.
Thiswayofprovingsufficiencyisratherlaborious,butwecanalsoapplyitfor continuousdistributionsasthenextexamplewillshow.
Example1.3 Letthecomponents yi ofarandomsample Y ofsize n be distributedas N(μ,1)withexpectedvalue μ andvariance σ 2 =1.Then M = yi issufficientrelativeto μ R1 = Ω.Toshowthiswefirstremarkthat Y isdistributedas N(μen, In).Thenweapplytheone-to-onetransformation
where|A|= n.Wewrite Z =(z1, Z2)=( yi, y2 y1, … , yn y1)andrecognisethat cov Z 2 , z 1 =cov Z 2 , M =cov en 1 , In 1 Y , e T n Y =0n 1
Consideringtheassumptionofnormaldistribution,thevariables M and Z2 arestochasticallyindependent.Consequently Z2 butalso Z2 M and Z M are independentof μ.Takingintoaccountthatthemapping Z = AY isbiunique, also Y M isindependentof μ.Hence M = yi issufficientrelativeto μ R1 . Withasufficient M = yi andarealnumber c 0,then c M isalsosufficient, thatis, 1 n yi = y issufficient.
Butsufficiencyplayssuchacrucialpartinmathematicalstatistics,andwe needsimplermethodsforprovingsufficiencyandmainlyforfindingsufficient statistics.Thefollowingtheoremisusefulinthisdirection.
Theorem1.1 (DecompositionTheorem)
Letadistributionfamily(Pθ , θ ϵΩ)ofarandomsample Y begiventhatisdominatedbyafinitemeasure ν.Thestatistic M(Y)issufficientrelativeto θ ,ifthe Radon–Nikodymdensity fθ of Pθ canbewrittencorrespondingto ν as
fθ Y = gθ MY hY 1 1
ν – almosteverywhere.Thenthefollowingholds:
• The v – integrablefunction gθ isnon-negativeandmeasurable.
• h isnon-negativeand h(Y)=0isfulfilledonlyforasetof Pθ – measure0.
ThegeneralproofcamefromHalmosandSavage(1949);itcanbealsofound, forexample,inBahadur(1955)orLehmannandRomano(2008).
Inthepresentbookweworkonlywithdiscreteandcontinuousprobability distributionssatisfyingtheassumptionsofTheorem1.1.Aproofofthetheorem forsuchdistributionsisgiveninRasch(1995).Wedonotwanttorepeatithere. FordiscretedistributionsTheorem1.1meansthattheprobabilityfunctionis oftheform
Corollary1.1 Ifthedistributionfamily(P∗(θ ), θ ϵΩ)oftherandomvariable y is a k-parametricexponentialfamilywiththenaturalparameter η andthelikelihoodfunction
then,denotingtherandomsample Y
issufficientrelativeto θ .
Proof: Itis
whichisoftheform(1.2)and(1.3),respectively,where hY
and θ = η.
Definition1.5 Twolikelihoodfunctions, L1(Y1, θ )and L2(Y2, θ ),aresaidtobe equivalent,denotedby L1 ~ L2,if
withafunction a(Y1, Y2)thatisindependentof θ .
Corollary1.2 M(Y)issufficientrelativeto θ ifandonlyif(iff)thelikelihood function LM(M, θ )of M = M(Y)isequivalenttothelikelihoodfunctionofarandomsample Y.
Proof: If M(Y)issufficientrelativeto θ ,thenbecauseof LM M , θ = aY LY , θ , aY >0, 1 8
LM(M, θ )hastogetherwith L(Y, η)theform(1.1).Reversely,(1.8)impliesthat theconditionaldistributionofarandomsample Y isforgiven M(Y)= M independentof θ .
Example1.4 Letthecomponents yi ofarandomsample Y =(y1, y2, … , yn)T bedistributedas N(μ,1).Thenitis
Since M Y = y isdistributedas N μ, 1 n ,weget
Hence LM y, μ LY , μ holds,and y issufficientrelativeto μ.
Corollary1.3 If c >0isarealnumberchosenindependentlyof θ and M(Y)is sufficientrelativeto θ ,then cM(Y)isalsosufficientrelativeto θ .
Hence,forexample,byputting M = yi and c = 1 n ,also 1 n yi = y issufficient.
Theproblemariseswhetherthereexistunderthestatisticssufficientrelative tothedistributionfamily(P∗(θ ), θ ϵ Ω)such,whichareminimalinacertain sense,containingasfewaspossiblecomponents.Thefollowingexampleshows thatthisproblemisnopureinvention.
Example1.5 Let(P∗(θ ), θ ϵ Ω)bethefamilyof N(μ,σ 2)-normaldistributions (σ >0).Weconsiderthestatistics
M 1 Y = Y M 2 Y = y2 1 , …, y2 n T
M 3 Y =
M 4 Y = n
ofarandomsample Y ofsize n,whichareallsufficientrelativeto σ 2.Thiscan easilybeshownusingCorollary1.1ofTheorem1.1(decompositiontheorem). Thelikelihoodfunctionsof M1(Y)and Y areidentical(andthereforeequivalent).
Sinceboththe yi andthe y2 i areindependentand y2 i σ 2 = χ 2 i aredistributedas CS(1) (χ 2-distributionwith1degreeoffreedom;seeAppendixA:Symbolism),itfollowsafterthetransformation y2 i = σ 2 χ 2 i :
Analogouslyweproceedwith M3(Y)and M4(Y).
Obviously, M4(Y)isthemostextensivecompressionofthecomponentsofa randomsample Y,andthereforeitispreferablecomparedwithotherstatistics.
Definition1.6 Astatistic M∗(Y)sufficientrelativeto θ issaidtobeminimal sufficientrelativeto θ ifitcanberepresentedasafunctionofeachothersufficientstatistic M(Y).
Hence, M4(Y)canbewrittenasafunctionofeachsufficientstatisticofthis example.Thisisnottruefor M1(Y), M2(Y)and M3(Y);theyarenotfunctions of M4(Y). M4(Y)istheonlystatisticofExample1.5thatcouldbeminimalsufficientrelativeto σ 2.Wewillseethatithasindeedthisproperty.But,howcanwe showminimalsufficiency?Werecognisethatthesamplespacecanbedecomposedwiththehelpofthestatistic M(Y)insuchawayintodisjointsubsetsthat all Y forwhich M(Y)suppliesthesamevalue M belongtothesamesubset.Vice versa,agivendecompositiondefinesastatistic.Nowwepresentadecompositionthatisshowntogenerateaminimalsufficientstatistic.
Definition1.7 Let Y0 {Y}beafixedpointinthesamplespace(acertain valueofarealisedrandomsample),whichcontainstherealisationsofarandom sample Y withcomponentsfromafamily(P∗(θ ), θ ϵΩ)ofprobabilitydistributions.Thelikelihoodfunction L(Y, θ )generatesby
MY0 = Y LY , θ LY0 , θ 1 12 asubsetin{Y}.If Y0 runsthroughthewholesamplespace{Y},thenacertain decompositionisgenerated.Thisdecompositioniscalledlikelihooddecomposition,andthecorrespondingstatistic ML(Y)satisfying ML(Y)=const.forall Y ϵ M(Y0)andeach Y0 iscalledlikelihoodstatistic.
Beforeweconstructminimalsufficientstatisticsforsomeexamplesbythis method,westate
Theorem1.2 Thelikelihoodstatistic ML(Y)isminimalsufficientrelativeto θ . Proof: Consideringthelikelihoodstatistic ML(Y),itholds
M L Y 1 = M L Y 2
for Y1 , Y2ϵ {Y}iff L(Y1, θ )~ L(Y2, θ )isfulfilled.Hence, L(Y, θ )isafunctionof ML(Y)havingtheform
L Y , θ = a Y g ∗ ML Y , θ 1 13
Therefore ML(Y)issufficientrelativeto θ takingTheorem1.1(decomposition theorem)intoaccount.If M(Y)isanyotherstatisticthatissufficientrelativeto θ ,iffurtherfortwopoints Y1 , Y2 ϵ {Y}therelation M(Y1)= M(Y2)issatisfied, andfinally,if L(Yi, θ )>0holdsfor i =1,2,thenagainTheorem1.1supplies
LY1 , θ = hY1 gMY1 , θ = hY2 gMY2 , θ becauseof M(Y1)= M(Y2)and
LY2 , θ = hY2 gMY2 , θ orequivalently gMY2 , θ = LY2 , θ hY2
LY1 , θ = hY1 hY2 LY2 , θ , hY2 >0, whichmeans L(Y1, θ )~ L(Y2, θ ).Butthisisjusttheconditionfor M(Y1)= M(Y2). Consequently ML(Y)isafunctionof M(Y),independentofhow M(Y)ischosen, thatis,itisminimalsufficient.
Example1.6 Letthecomponents yi ofarandomsample Y fulfilabinomial distribution B(N,p), N fixedand0< p <l.Welookforastatisticthatisminimal sufficientrelativeto p.Thelikelihoodfunctionis
LY , p = n i =1 N yi p yi 1 p N yi , yi =0,1, …, N
Forall Y0 =(y01, … , y0N)T ϵ {Y}with L(Y0, p)>0,itis
LY , p LY0 , p = n i =1 N yi n i =1 N y0i p 1 p n i =1 yi y0i
Therefore M(Y0)isalsodefinedby MY0 = Y n i =1 yi = n i =1 y0i ,sincejust there L(Y, p)~ L(Y0, p)holds.Hence M Y = n i =1 yi isaminimalsufficient statistic.
Example1.7 Letthecomponents yi ofarandomsample Y =(y1, y2, … , yn)T begammadistributed.Thenfor yi >0wegetthelikelihoodfunction
, a, k =
Forall Y0 =(y01, … , y0N)T ϵ {Y}with L(Y0, a, k)>0,itis
LY , a, k
0 , a, k
Forgiven a theproduct n i =1 yi isminimalsufficientrelativeto k.If k is known,then n i =1 yi isminimalsufficientrelativeto a.Ifboth a and k are unknownparameters,then( n i =1 yi , n i =1 yi )isminimalsufficientrelativeto (a,k).
Theorem1.3 If(P∗(θ ), θ ϵΩ)isa k-parametricexponentialfamilywithlikelihoodfunctionincanonicalform
Ly, θ =e k i =0 ηi Mi y A η hy , wherethedimensionoftheparameterspaceis k (i.e.the η1, …, ηk arelinearly independent),then M Y = n i =1 M1 yi ,…, n i =1 Mk yi T isminimalsufficientrelativeto(P∗(θ ), θϵΩ).
Proof: Thesufficiencyof M(Y)followsfromCorollary1.1ofthedecomposition theorem(Theorem1.1),andtheminimalsufficiencyfollowsfromthefactthat M(Y)isthelikelihoodstatistic,becauseitis L(Y, θ )~ L(Y0, θ )if k j =1 ηj n i =1 Mj yi Mj y0i =0
Regardingthelinearindependenceof ηi,itisonlythecaseif M(Y)= M(Y0)is fulfilled.
Example1.8 Let(P∗(θ ), θ ϵ Ω)thefamilyofatwo-dimensionalnormaldistributionswiththerandomvariable x y ,theexpectation μ = μx μy andthe
