Download Complete Biostatistical analysis: pearson new international edition jerrold h zar PDF for A

Page 1


https://ebookmass.com/product/biostatistical-analysispearson-new-international-edition-jerrold-h-zar/

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Analysis with an introduction to proof. Fifth Edition, Pearson New International Edition Steven R. Lay

https://ebookmass.com/product/analysis-with-an-introduction-to-prooffifth-edition-pearson-new-international-edition-steven-r-lay/ ebookmass.com

Introduction to Biotechnology: Pearson New International Edition William J Thieman

https://ebookmass.com/product/introduction-to-biotechnology-pearsonnew-international-edition-william-j-thieman/

ebookmass.com

Discrete Time Signal Processing: Pearson New International Edition [Print Replica] (Ebook PDF)

https://ebookmass.com/product/discrete-time-signal-processing-pearsonnew-international-edition-print-replica-ebook-pdf/ ebookmass.com

The Handbook of Research on Black Males: Quantitative, Qualitative, and Multidisciplinary (International Race and Education Series) (Ebook PDF)

https://ebookmass.com/product/the-handbook-of-research-on-black-malesquantitative-qualitative-and-multidisciplinary-international-race-andeducation-series-ebook-pdf/ ebookmass.com

Public Speaking: The Evolving Art (MindTap Course List) 4th Edition, (Ebook PDF)

https://ebookmass.com/product/public-speaking-the-evolving-artmindtap-course-list-4th-edition-ebook-pdf/

ebookmass.com

5 Steps to a 5: AP French Language and Culture Genevieve Brand

https://ebookmass.com/product/5-steps-to-a-5-ap-french-language-andculture-genevieve-brand/

ebookmass.com

Subas Dhakal Alan Nankervis John Burgess Ageing Asia And The Pacific In Changing Times Implications For Sustainable Development Desconocido

https://ebookmass.com/product/subas-dhakal-alan-nankervis-johnburgess-ageing-asia-and-the-pacific-in-changing-times-implicationsfor-sustainable-development-desconocido/ ebookmass.com

X-Ray Fluorescence in Biological Sciences: Principles, Instrumentation, and Applications Vivek K. Singh

https://ebookmass.com/product/x-ray-fluorescence-in-biologicalsciences-principles-instrumentation-and-applications-vivek-k-singh/

ebookmass.com

A Political History of Big Science: The Other Europe 1st ed. Edition Katharina C. Cramer

https://ebookmass.com/product/a-political-history-of-big-science-theother-europe-1st-ed-edition-katharina-c-cramer/

ebookmass.com

The Skunk, the Tibetan Fox and Their Wolf Omega (Omegas of Animals: SD Book 6) Lorelei M. Hart & Wendy Rathbone

https://ebookmass.com/product/the-skunk-the-tibetan-fox-and-theirwolf-omega-omegas-of-animals-sd-book-6-lorelei-m-hart-wendy-rathbone/

ebookmass.com

Biostatistical Analysis

Pearson Education Limited

Edinburgh Gate

Harlow

Essex CM20 2JE

England and Associated Companies throughout the world

Visit us on the World Wide Web at: www.pearsoned.co.uk

© Pearson Education Limited 2014

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a licence permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS.

All trademarks used herein are the property of their respective owners. The use of any trademark in this text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affiliation with or endorsement of this book by such owners.

ISBN 10: 1-292-02404-6

ISBN 13: 978-1-292-02404-2

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library Printed in the United States of America

5

8

13

Data:TypesandPresentation

1TYPESOFBIOLOGICALDATA

2ACCURACYANDSIGNIFICANTFIGURES

3FREQUENCYDISTRIBUTIONS

4CUMULATIVEFREQUENCYDISTRIBUTIONS

Scientificstudyinvolvesthesystematiccollection,organization,analysis,andpresentationofknowledge.Manyinvestigationsinthebiologicalsciencesarequantitative, whereknowledgeisintheformofnumericalobservationscalled data.(Onenumericalobservationisa datum.*)Inorderforthepresentationandanalysisofdatatobe validanduseful,wemustusemethodsappropriatetothetypeofdataobtained,tothe designofthedatacollection,andtothequestionsaskedofthedata;andthelimitationsofthedata,ofthedatacollection,andofthedataanalysisshouldbeappreciated whenformulatingconclusions.

Theword statistics isderivedfromtheLatinfor“state,”indicatingthehistorical importanceofgovernmentaldatagathering,whichrelatedprincipallytodemographic information(includingcensusdataand“vitalstatistics”)andoftentotheirusein militaryrecruitmentandtaxcollecting.†

Theterm statistics isoftenencounteredasasynonymfor data:Onehearsofcollegeenrollmentstatistics(suchasthenumbersofnewlyadmittedstudents,numbers ofseniorstudents,numbersofstudentsfromvariousgeographiclocations),statistics ofabasketballgame(suchashowmanypointswerescoredbyeachplayer,how manyfoulswerecommitted),laborstatistics(suchasnumbersofworkersunemployed,numbersemployedinvariousoccupations),andsoon.Hereafter,thisuse oftheword statistics willnotappearinthistext.Instead,itwillbeusedinitsother commonmanner:torefertothe orderlycollection,analysis,andinterpretationofdata withaviewtoobjectiveevaluationofconclusionsbasedonthedata

Statisticsappliedtobiologicalproblemsissimplycalled biostatistics or,sometimes, biometry‡ (thelattertermliterallymeaning“biologicalmeasurement”).Although

*Theterm data issometimesseenasasingularnounmeaning“numericalinformation.”This bookrefrainsfromthatuse.

† Peters(1987:79)andWalker(1929:32)attributethefirstuseoftheterm statistics toaGerman professor,GottfriedAchenwall(1719–1772),whousedtheGermanword Statistik in1749,andthe firstpublisheduseoftheEnglishwordtoJohnSinclair(1754–1835)in1791.

‡ Theword biometry,whichliterallymeans“biologicalmeasurement,”had,sincethenineteenthcentury,beenfoundinseveralcontexts(suchasdemographicsand,later,quantitativegenetics;Armitage,1985;Stigler,2000),butusingittomeantheapplicationofstatisticalmethodstobiological informationapparentlywasconceivedbetween1892and1901byKarlPearson,alongwiththename Biometrika forthestill-importantEnglishjournalhehelpedfound;anditwasfirstpublishedinthe inauguralissueofthisjournalin1901(Snedecor,1954).TheBiometricsSectionoftheAmerican

FromChapter1of BiostatisticalAnalysis,FifthEdition,JerroldH.Zar.Copyright c 2010by PearsonEducation,Inc.PublishingasPearsonPrenticeHall.Allrightsreserved.

Data:TypesandPresentation

thefieldofstatisticshasrootsextendingbackhundredsofyears,itsdevelopment beganinearnestinthelatenineteenthcentury,andamajorimpetusfromearlyin thisdevelopmenthasbeentheneedtoexaminebiologicaldata.

Statisticalconsiderationscanaidinthedesignofexperimentsintendedtocollect dataandinthesettingupofhypothesestobetested.Manybiologistsattemptthe analysisoftheirresearchdataonlytofindthattoofewdatawerecollectedtoenable reliableconclusionstobedrawn,orthatmuchextraeffortwasexpendedincollecting datathatcannotbeofreadyuseintheanalysisoftheexperiment.Thus,aknowledge ofbasicstatisticalprinciplesandproceduresisimportantasresearchquestionsare formulated before anexperimentanddatacollectionarebegun.

Oncedatahavebeenobtained,wemayorganizeandsummarizetheminsuch awayastoarriveattheirorderlyandinformativepresentation.Suchprocedures areoftentermed descriptivestatistics.Forexample,measurementsmightbemade oftheheightsofall13-year-oldchildreninaschooldistrict,perhapsdetermining anaverageheightforeachsex.However,perhapsitisdesiredtomakesomegeneralizationsfromthesedata.Wemight,forexample,wishtomakeareasonable estimateoftheheightsofall13-year-oldsinthestate.Orwemightwishtoconcludewhetherthe13-year-oldboysinthestateareontheaveragetallerthanthegirls ofthatage.Theabilitytomakesuchgeneralizedconclusions,inferringcharacteristicsofthewholefromcharacteristicsofitsparts,lieswithintherealmof inferential statistics

1TYPESOFBIOLOGICALDATA

Acharacteristic(forexample,size,color,number,chemicalcomposition)thatmay differfromonebiologicalentitytoanotheristermeda variable (or,sometimes,a variate∗ ),andseveraldifferentkindsofvariablesmaybeencounteredbybiologists. Becausetheappropriatenessofdescriptiveorinferentialstatisticalproceduresdependsuponthepropertiesofthedataobtained,itisdesirabletodistinguishamong theprincipalkindsofdata.Theclassificationusedhereisthatwhichiscommonly employed(Senders,1958;Siegel,1956;Stevens,1946,1968).However,notalldata fitneatlyintothesecategoriesandsomedatamaybetreateddifferentlydepending uponthequestionsaskedofthem.

(a)DataonaRatioScale. Imaginethatwearestudyingagroupofplants,thatthe heightsoftheplantsconstituteavariableofinterest,andthatthenumberofleaves perplantisanothervariableunderstudy.Itispossibletoassignanumericalvalue totheheightofeachplant,andcountingtheleavesallowsanumericalvaluetobe recordedforthenumberofleavesoneachplant.Regardlessofwhethertheheight measurementsarerecordedincentimeters,inches,orotherunits,andregardlessof whethertheleavesarecountedinanumbersystemusingbase10oranyotherbase, therearetwofundamentallyimportantcharacteristicsofthesedata.

First,thereisaconstantsizeintervalbetweenadjacentunitsonthemeasurement scale.Thatis,thedifferenceinheightbetweena36-cmanda37-cmplantisthesame

StatisticalAssociationwasestablishedin1938,successortotheCommitteeonBiometricsofthat organization,andbeganpublishingthe BiometricsBulletin in1945,whichtransformedin1947into thejournal Biometrics,ajournalretainingmajorimportancetoday.Morerecently,theterm biometrics hasbecomewidelyusedtorefertothestudyofhumanphysicalcharacteristics(including facialandhandcharacteristics,fingerprints,DNAprofiles,andretinalpatterns)foridentification purposes.

∗ “Variate”wasfirstusedbyR.A.Fisher(1925:5;David,1995).

Data:TypesandPresentation asthedifferencebetweena39-cmanda40-cmplant,andthedifferencebetween eightandtenleavesisequaltothedifferencebetweennineandelevenleaves.

Second,itisimportantthatthereexistsazeropointonthemeasurementscale andthatthereisaphysicalsignificancetothiszero.Thisenablesustosaysomething meaningfulabouttheratioofmeasurements.Wecansaythata30-cm(11.8-in.)tall plantishalfastallasa60-cm(23.6-in.)plant,andthataplantwithforty-fiveleaves hasthreetimesasmanyleavesasaplantwithfifteen.

Measurementscaleshavingaconstantintervalsizeandatruezeropointaresaid tobe ratioscales ofmeasurement.Besideslengthsandnumbersofitems,ratioscales includeweights(mg,lb,etc.),volumes(cc,cuft,etc.),capacities(ml,qt,etc.),rates (cm/sec,mph,mg/min,etc.),andlengthsoftime(hr,yr,etc.).

(b)DataonanIntervalScale. Somemeasurementscalespossessaconstantinterval sizebutnotatruezero;theyarecalled intervalscales.Acommonexampleisthat ofthetwocommontemperaturescales:Celsius(C)andFahrenheit(F).Wecansee thatthesamedifferenceexistsbetween20◦ C(68◦ F)and25◦ C(77◦ F)asbetween5◦ C (41◦ F)and10◦ C(50◦ F);thatis,themeasurementscaleiscomposedofequal-sized intervals.Butitcannotbesaidthatatemperatureof40◦ C(104◦ F)istwiceashot asatemperatureof20◦ C(68◦ F);thatis,thezeropointisarbitrary.∗ (Temperature measurementsontheabsolute,orKelvin[K],scalecanbereferredtoaphysically meaningfulzeroandthusconstitutearatioscale.)

Someintervalscalesencounteredinbiologicaldatacollectionare circularscales. Timeofdayandtimeoftheyearareexamplesofsuchscales.Theintervalbetween 2:00 p.m. (i.e.,1400hr)and3:30 p.m. (1530hr)isthesameastheintervalbetween8:00 a.m. (0800hr)and9:30 a.m. (0930hr).Butonecannotspeakofratiosoftimesofday becausethezeropoint(midnight)onthescaleisarbitrary,inthatonecouldjustas wellsetupascalefortimeofdaywhichwouldhavenoon,or3:00 p.m.,oranyother timeasthezeropoint.Circularbiologicaldataareoccasionallycompasspoints,as ifonerecordsthecompassdirectioninwhichananimalorplantisoriented.Asthe designationofnorthas0◦ isarbitrary,thiscircularscaleisaformofintervalscaleof measurement.

(c)DataonanOrdinalScale. Theprecedingparagraphsonratioandintervalscales ofmeasurementdiscusseddatabetweenwhichweknownumericaldifferences.For example,ifman A weighs90kgandman B weighs80kg,thenman A isknown toweigh10kgmorethan B.Butourdatamay,instead,bearecordonlyofthe factthatman A weighsmorethanman B (withnoindicationofhowmuchmore). Thus,wemaybedealingwithrelativedifferencesratherthanquantitativedifferences. Suchdataconsistofanorderingorrankingofmeasurementsandaresaidtobeon an ordinal scaleofmeasurement(ordinal beingfromtheLatinwordfor“order”). Wemayspeakofonebiologicalentitybeingshorter,darker,faster,ormoreactive thananother;thesizesoffivecelltypesmightbelabeled1,2,3,4,and5,todenote

∗ TheGerman-DutchphysicistGabrielDanielFahrenheit(1686–1736)inventedthethermometerin1714andin1724employedascaleonwhichsaltwaterfrozeatzerodegrees,purewaterfroze at32degrees,andpurewaterboiledat212degrees.In1742theSwedishastronomerAndersCelsius(1701–1744)devisedatemperaturescalewith100degreesbetweenthefreezingandboiling pointsofwater(theso-called“centigrade”scale),firstbyreferringtozerodegreesasboilingand 100degreesasfreezing,andlater(perhapsatthesuggestionofSwedishbotanistandtaxonomist CarolusLinnaeus[1707–1778])reversingthesetworeferencepoints(Asimov,1982:177).

Data:TypesandPresentation

theirmagnitudesrelativetoeachother;orsuccessinlearningtorunamazemaybe recordedas A, B,or C .

Itisoftentruethatbiologicaldataexpressedontheordinalscalecouldhavebeen expressedontheintervalorratioscalehadexactmeasurementsbeenobtained(or obtainable).Sometimesdatathatwereoriginallyonintervalorratioscaleswillbe changedtoranks;forexample,examinationgradesof99,85,73,and66%(ratioscale) mightberecordedasA,B,C,andD(ordinalscale),respectively.

Ordinal-scaledatacontainandconveylessinformationthanratioorintervaldata, foronlyrelativemagnitudesareknown.Consequently,quantitativecomparisonsare impossible(e.g.,wecannotspeakofagradeofCbeinghalfasgoodasagradeof A,orofthedifferencebetweencellsizes1and2beingthesameasthedifference betweensizes3and4).However,wewillseethatmanyusefulstatisticalprocedures are,infact,applicabletoordinaldata.

(d)DatainNominalCategories.

Sometimesthevariablebeingstudiedisclassified bysomequalitativemeasureitpossessesratherthanbyanumericalmeasurement. Insuchcasesthevariablemaybecalledan attribute,andwearesaidtobedealing with nominal,or categorical,data.Geneticphenotypesarecommonlyencountered biologicalattributes:Thepossiblemanifestationsofananimal’seyecolormightbe brownorblue;andifhumanhaircolorweretheattributeofinterest,wemight recordblack,brown,blond,orred.Asotherexamplesofnominaldata(nominal is fromtheLatinwordfor“name”),peoplemightbeclassifiedasmaleorfemale,or right-handedorleft-handed.Or,plantsmightbeclassifiedasdeadoralive,oraswith orwithoutfertilizerapplication.Taxonomiccategoriesalsoformanominalclassificationscheme(forexample,plantsinastudymightbeclassifiedaspine,spruce, orfir).

Sometimes,datathatmighthavebeenexpressedonanordinal,interval,orratio scaleofmeasurementmayberecordedinnominalcategories.Forexample,heights mightberecordedastallorshort,orperformanceonanexaminationaspassorfail, wherethereisanarbitrarycut-offpointonthemeasurementscaletoseparatetall fromshortandpassfromfail.

Aswillbeseen,statisticalmethodsusefulwithratio,interval,orordinaldatagenerallyarenotapplicabletonominaldata,andwemust,therefore,beabletoidentify suchsituationswhentheyoccur.

(e)ContinuousandDiscreteData. Whenwespokepreviouslyofplantheights,we weredealingwithavariablethatcouldbeanyconceivablevaluewithinanyobserved range;thisisreferredtoasa continuousvariable.Thatis,ifwemeasureaheightof 35cmandaheightof36cm,aninfinitenumberofheightsispossibleintherange from35to36cm:aplantmightbe35.07cmtallor35.988cmtall,or35.3263cmtall, andsoon,although,ofcourse,wedonothavedevicessensitiveenoughtodetectthis infinityofheights.Acontinuousvariableisoneforwhichthereisapossiblevalue betweenanyothertwovalues.

However,whenspeakingofthenumberofleavesonaplant,wearedealingwitha variablethatcantakeononlycertainvalues.Itmightbepossibletoobserve27leaves, or28leaves,but27.43leavesand27.9leavesarevaluesofthevariablethatare impossibletoobtain.Suchavariableistermeda discrete or discontinuousvariable (alsoknownasa meristicvariable).Thenumberofwhitebloodcellsin1mm3 of blood,thenumberofgiraffesvisitingawaterhole,andthenumberofeggslaidby agrasshopperarealldiscretevariables.Thepossiblevaluesofadiscretevariable generallyareconsecutiveintegers,butthisisnotnecessarilyso.Iftheleavesonour

Data:TypesandPresentation

plantsarealwaysformedinpairs,thenonlyevenintegersarepossiblevaluesofthe variable.Andtheratioofnumberofwingstonumberoflegsofinsectsisadiscrete variablethatmayonlyhavethevalueof0,0.3333 ... ,or0.6666 ... (i.e., 0 6 , 2 6 ,or 4 6 , respectively).∗

Ratio-,interval-,andordinal-scaledatamaybeeithercontinuousordiscrete. Nominal-scaledatabytheirnaturearediscrete.

2ACCURACYANDSIGNIFICANTFIGURES

Accuracy isthenearnessofameasurementtothetruevalueofthevariablebeing measured. Precision isnotasynonymoustermbutreferstotheclosenesstoeach otherofrepeatedmeasurementsofthesamequantity.Figure1illustratesthedifferencebetweenaccuracyandprecisionofmeasurements.

FIGURE1: Accuracyandprecisionofmeasurements.A3-kilogramanimalisweighed10times.The10 measurementsshowninsample(a)arerelativelyaccurateandprecise;thoseinsample(b)arerelatively accuratebutnotprecise;thoseofsample(c)arerelativelyprecisebutnotaccurate;andthoseofsample (d)arerelativelyinaccurateandimprecise.

Humanerrormayexistintherecordingofdata.Forexample,apersonmaymiscountthenumberofbirdsinatractoflandormisreadthenumbersonaheart-rate monitor.Or,apersonmightobtaincorrectdatabutrecordtheminsuchaway(perhapswithpoorhandwriting)thatasubsequentdataanalystmakesanerrorinreading them.Weshallassumethatsucherrorshavenotoccurred,butthereareotheraspects ofaccuracythatshouldbeconsidered.

Accuracyofmeasurementcanbeexpressedinnumericalreporting.Ifwereport thatthehindlegofafrogis8cmlong,wearestatingthenumber8(avalueofa continuousvariable)asanestimateofthefrog’strueleglength.Thisestimatewas madeusingsomesortofameasuringdevice.Hadthedevicebeencapableofmore accuracy,wemighthavedeclaredthatthelegwas8.3cmlong,orperhaps8.32cm long.Whenrecordingvaluesofcontinuousvariables,itisimportanttodesignatethe accuracywithwhichthemeasurementshavebeenmade.Byconvention,thevalue 8denotesameasurementintherangeof7.50000 ... to8.49999 ... ,thevalue8.3 designatesarangeof8.25000 ... to8.34999 ... ,andthevalue8.32impliesthatthe truevaluelieswithintherangeof8 31500 to8 32499 .Thatis,thereported valueisthemidpointoftheimpliedrange,andthesizeofthisrangeisdesignated bythelastdecimalplaceinthemeasurement.Thevalueof8cmimpliesanabilityto

∗ Theellipsismarks( )maybereadas“andsoon.”Here,theyindicatethat 2 6 and 4 6 are repeatingdecimalfractions,whichcouldjustaswellhavebeenwrittenas0 3333333333333 and 0 6666666666666 ,respectively.

0123 (a)(b)

Data:TypesandPresentation determinelengthwithinarangeof1cm,8.3cmimpliesarangeof0.1cm,and8.32cm impliesarangeof0.01cm.Thus,torecordavalueof8.0impliesgreateraccuracyof measurementthandoestherecordingofavalueof8,forinthefirstinstancethe truevalueissaidtoliebetween7.95000 ... and8.049999 ... (i.e.,withinarangeof 0.1cm),whereas8impliesavaluebetween7.50000 ... and8.49999 ... (i.e.,withina rangeof1cm).Tostate8 00cmimpliesameasurementthatascertainsthefrog’slimb lengthtobebetween7 99500 and8 00499 cm(i.e.,withinarangeof0.01cm). Thosedigitsinanumberthatdenotetheaccuracyofthemeasurementarereferred toas significantfigures.Thus,8hasonesignificantfigure,8.0and8.3eachhavetwo significantfigures,and8.00and8.32eachhavethree.

Inworkingwithexactvaluesofdiscretevariables,theprecedingconsiderationsdo notapply.Thatis,itissufficienttostatethatourfroghasfourlimbsorthatitsleft lungcontainsthirteenflukes.Theuseof4.0or13.00wouldbeinappropriate,forasthe numbersinvolvedareexactly4and13,thereisnoquestionofaccuracyorsignificant figures.

Butthereareinstanceswheresignificantfiguresandimpliedaccuracycomeinto playwithdiscretedata.Anentomologistmayreportthatthereare72,000mothsin aparticularforestarea.Indoingso,itisprobablynotbeingclaimedthatthisisthe exactnumberbutanestimateoftheexactnumber,perhapsaccuratetotwosignificant figures.Insuchacase,72,000wouldimplyarangeofaccuracyof1000,sothatthetrue valuemightlieanywherefrom71,500to72,500.Iftheentomologistwishedtoconvey thefactthatthisestimateisbelievedtobeaccuratetothenearest100(i.e.,tothree significantfigures),ratherthantothenearest1000,itwouldbebettertopresentthe dataintheformof scientificnotation, ∗ asfollows:Ifthenumber7 2 × 104 (= 72,000) iswritten,arangeofaccuracyof0.1 × 104 (= 1000) isimplied,andthetruevalueis assumedtoliebetween71,500and72,500.Butif7 20 × 104 werewritten,arangeof accuracyof0 01 × 104 (= 100) wouldbeimplied,andthetruevaluewouldbeassumed tobeintherangeof71,950to72,050.Thus,theaccuracyoflargevalues(andthis appliestocontinuousaswellasdiscretevariables)canbeexpressedsuccinctlyusing scientificnotation.

Calculatorsandcomputerstypicallyyieldresultswithmoresignificantfiguresthan arejustifiedbythedata.However,itisgoodpractice—toavoidroundingerror—to retainmanysignificantfiguresuntilthelaststepinasequenceofcalculations,andon attainingtheresultofthefinalsteptoroundofftotheappropriatenumberoffigures.

3FREQUENCYDISTRIBUTIONS

Whencollectingandsummarizinglargeamountsofdata,itisoftenhelpfultorecord thedataintheformofa frequencytable.Suchatablesimplyinvolvesalistingofall theobservedvaluesofthevariablebeingstudiedandhowmanytimeseachvalueis observed.Considerthetabulationofthefrequencyofoccurrenceofsparrownests ineachofseveraldifferentlocations.ThisisillustratedinExample1,wherethe observedkindsofnestsitesarelisted,andforeachkindthenumberofnestsobserved isrecorded.Thedistributionofthetotalnumberofobservationsamongthevariouscategoriesistermeda frequencydistribution.Example1isafrequencytable fornominaldata,andthesedatamayalsobepresentedgraphicallybymeansofa bargraph (Figure2),wheretheheightofeachbarisproportionaltothefrequency intheclassrepresented.Thewidthsofallbarsinabargraphshouldbeequalso

∗ Theuseofscientificnotation—byphysicists—canbetracedbacktoatleastthe1860s(Miller, 2004b).

EXAMPLE1TheLocationofSparrowNests:AFrequencyTableof NominalData

Thevariableisnestsite,andtherearefourrecordedcategoriesofthisvariable. Thenumbersrecordedinthesecategoriesconstitutethefrequencydistribution.

NestSiteNumberofNestsObserved

A.Vines56

B.Buildingeaves60

C.Lowtreebranches46

D.Treeandbuildingcavities49

FIGURE2: AbargraphofthesparrownestdataofExample1.Anexampleofabargraphfornominal data.

thattheeyeofthereaderisnotdistractedfromthedifferencesinbarheights;this alsomakestheareaofeachbarproportionaltothefrequencyitrepresents.Also, thefrequencyscaleontheverticalaxisshouldbeginatzerotoavoidtheapparent differencesamongbars.If,forexample,abargraphofthedataofExample1were constructedwiththeverticalaxisrepresentingfrequenciesof45to60ratherthan0 to60,theresultswouldappearasinFigure3.Huff(1954)illustratesothertechniques thatcanmisleadthereadersofgraphs.Itisgoodpracticetoleavespacebetween thebarsofabargraphofnominaldata,toemphasizethedistinctnessamongthe categoriesrepresented.

AfrequencytabulationofordinaldatamightappearasinExample2,whichpresentstheobservednumbersofsunfishcollectedineachoffivecategories,eachcategorybeingadegreeofskinpigmentation.Abargraph(Figure4)canbepreparedfor thisfrequencydistributionjustasfornominaldata.

FIGURE3: AbargraphofthesparrownestdataofExample1,drawnwiththeverticalaxisstartingat 45.ComparethiswithFigure1,wheretheaxisstartsat0.

EXAMPLE2NumbersofSunfish,TabulatedAccordingtoAmountofBlack Pigmentation:AFrequencyTableofOrdinalData

Thevariableisamountofpigmentation,whichisexpressedbynumerically orderedclasses.Thenumbersrecordedforthefivepigmentationclassescompose thefrequencydistribution.

PigmentationClassAmountofPigmentationNumberofFish

0Noblackpigmentation13

1Faintlyspeckled68

2Moderatelyspeckled44

3Heavilyspeckled21

4Solidblackpigmentation8 PigmentationClass

FIGURE4: AbargraphofthesunfishpigmentationdataofExample2.Anexampleofabargraphfor ordinaldata.

Data:TypesandPresentation

Inpreparingfrequencytablesofinterval-andratio-scaledata,wecanmakeaproceduraldistinctionbetweendiscreteandcontinuousdata.Example3showsdiscrete datathatarefrequenciesoflittersizesinfoxes,andFigure5presentsthisfrequency distributiongraphically.

EXAMPLE3FrequencyofOccurrenceofVariousLitterSizesinFoxes:A FrequencyTableofDiscrete,Ratio-ScaleData

Thevariableislittersize,andthenumbersrecordedforthefivelittersizesmake upfrequencydistribution.

FIGURE5: AbargraphofthefoxlitterdataofExample3.Anexampleofabargraphfordiscrete, ratio-scaledata.

Example4ashowsdiscretedatathatarethenumbersofaphidsfoundperclover plant.Thesedatacreatequitealengthyfrequencytable,anditisnotdifficulttoimaginesetsofdatawhosetabulationwouldresultinanevenlongerlistoffrequencies. Thus,forpurposesofpreparingbargraphs,weoftencastdataintoafrequencytable bygroupingthem.

Example4bisatableofthedatafromExample4aarrangedbygroupingthedata intosizeclasses.ThebargraphforthisdistributionappearsasFigure6.Suchgroupingresultsinthelossofsomeinformationandisgenerallyutilizedonlytomake frequencytablesandbargraphseasiertoread,andnotforcalculationsperformedon

Data:TypesandPresentation

thedata.Therehavebeenseveral“rulesofthumb”proposedtoaidindecidinginto howmanyclassesdatamightreasonablybegrouped,fortheuseoftoofewgroupswill obscurethegeneralshapeofthedistribution.Butsuch“rules”orrecommendations areonlyroughguides,andthechoiceisgenerallylefttogoodjudgment,bearingin mindthatfrom10to20groupsareusefulformostbiologicalwork.(SeealsoDoane, 1976.)Ingeneral,groupsshouldbeestablishedthatareequalinthesizeintervalof thevariablebeingmeasured.(Forexample,thegroupsizeintervalinExample4bis fouraphidsperplant.)

EXAMPLE4aNumberofAphidsObservedperCloverPlant:AFrequencyTableofDiscrete,Ratio-ScaleData

NumberofAphidsNumberof NumberofAphidsNumberof onaPlantPlantsObserved onaPlantPlantsObserved 03

Totalnumberofobservations = 424

Becausecontinuousdata,contrarytodiscretedata,cantakeonaninfinityofvalues,oneisessentiallyalwaysdealingwithafrequencydistributiontabulatedby groups.Ifthevariableofinterestwereaweight,measuredtothenearest0.1mg,afrequencytableentryofthenumberofweightsmeasuredtobe48.6mgwouldbeinterpretedtomeanthenumberofweightsgroupedbetween48 5500 and48 6499 mg (althoughinafrequencytablethisclassintervalisusuallywrittenas48.55–48.65). Example5presentsatabulationof130determinationsoftheamountofphosphorus, inmilligramspergram,indriedleaves.(Ignorethelasttwocolumnsofthistableuntil Section4.)

Data:TypesandPresentation

EXAMPLE4bNumberofAphidsObservedperCloverPlant:AFrequency TableGroupingtheDiscrete,Ratio-ScaleDataofExample4a

NumberofAphidsNumberof onaPlantPlantsObserved 0–36 4–717 8–1140 12–1554 16–1959 20–2375 24–2777 28–3155 32–3532 36–398 40–431

Totalnumberofobservations = 424

FIGURE6: AbargraphoftheaphiddataofExample4b.Anexampleofabargraphforgroupeddiscrete, ratio-scaledata.

EXAMPLE5DeterminationsoftheAmountofPhosphorusinLeaves:A FrequencyTableofContinuousData

Cumulativefrequency

Frequency

Phosphorus(i.e.,numberofStartingwithStartingwith (mg/gofleaf)determinations)LowValuesHighValues

8.15–8.2522130 8.25–8.3568128 8.35–8.45816122 8.45–8.551127114 8.55–8.651744103 8.65–8.75176186 8.75–8.85248569 8.85–8.951810345 8.95–9.051311627 9.05–9.151012614 9.15–9.2541304

Totalfrequency = 130 = n

Inpresentingthisfrequencydistributiongraphically,onecanpreparea histogram, ∗ whichisthenamegiventoabargraphbasedoncontinuousdata.Thisisdonein Figure7;notethatratherthanindicatingtherangeonthehorizontalaxis,weindicate onlythemidpointoftherange,aprocedurethatresultsinlesscrowdedprintingon thegraph.Notealsothatadjacentbarsinahistogramareoftendrawntouchingeach other,toemphasizethecontinuityofthescaleofmeasurement,whereasintheother bargraphsdiscussedtheygenerallyarenot.

FIGURE7: AhistogramoftheleafphosphorusdataofExample5.Anexampleofahistogramforcontinuousdata.

∗ Theterm histogram isfromGreekroots(referringtoapole-shapeddrawing)andwasfirst publishedbyKarlPearsonin1895(David1995).

FIGURE8: AfrequencypolygonfortheleafphosphorusdataofExample5.

Oftena frequencypolygon isdrawninsteadofahistogram.Thisisdonebyplotting thefrequencyofeachclassasadot(orothersymbol)attheclassmidpointandthen connectingeachadjacentpairofdotsbyastraightline(Figure8).Itis,ofcourse,the sameasifthemidpointsofthetopsofthehistogrambarswereconnectedbystraight lines.Insteadofplottingfrequenciesontheverticalaxis,onecanplot relativefrequencies,orproportionsofthetotalfrequency.Thisenablesdifferentdistributionsto bereadilycomparedandevenplottedonthesameaxes.Sometimes,asinFigure8, frequencyisindicatedononeverticalaxisandthecorrespondingrelativefrequency ontheother.(UsingthedataofExample5,therelativefrequencyfor8.2mg/gis 2/130 = 0 015,thatfor8.3mg/gis6/130 = 0 046,thatfor9.2mg/gis4/130 = 0 030, andsoon.Thetotalofallthefrequenciesis n,andthetotalofalltherelativefrequenciesis1.)

Frequencypolygonsarealsocommonlyusedfordiscretedistributions,butonecan argueagainsttheirusewhendealingwithordinaldata,asthepolygonimpliestothe readeraconstantsizeintervalhorizontallybetweenpointsonthepolygon.Frequency polygonsshouldnotbeemployedfornominal-scaledata.

Ifwehaveafrequencydistributionofvaluesofacontinuousvariablethatfalls intoalargenumberofclassintervals,thedatamaybegroupedaswasdemonstrated withdiscretevariables.Thisresultsinfewerintervals,buteachintervalis,ofcourse, larger.Themidpointsoftheseintervalsmaythenbeusedinthepreparationofa histogramorfrequencypolygon.Theuseroffrequencypolygonsiscautionedthat suchagraphissimplyanaidtotheeyeinfollowingtrendsinfrequencydistributions, andoneshouldnotattempttoreadfrequenciesbetweenpointsonthepolygon.Also notethatthemethodpresentedfortheconstructionofhistogramsandfrequency polygonsrequiresthattheclassintervalsbeequal.Lastly,theverticalaxis(e.g.,the frequencyscale)onfrequencypolygonsandbargraphsgenerallyshouldbeginwith zero,especiallyifgraphsaretobecomparedwithoneanother.Ifthisisnotdone,the eyemaybemisledbytheappearanceofthegraph(asshownfornominal-scaledata inFigures2and3).

4CUMULATIVEFREQUENCYDISTRIBUTIONS

Afrequencydistributioninformsushowmanyobservationsoccurredforeachvalue (orgroupofvalues)ofavariable.Thatis,examinationofthefrequencytableof Example3(oritscorrespondingbargraphorfrequencypolygon)wouldyieldinformationsuchas,“Howmanyfoxlittersoffourwereobserved?”,theanswerbeing 27.Butifitisdesiredtoaskquestionssuchas,“Howmanylittersoffourormore wereobserved?”,or“Howmanyfoxlittersoffiveorfewerwereobserved?”,weare speakingof cumulativefrequencies.Toanswerthefirstquestion,wesumallfrequenciesforlittersizesfourandup,andforthesecondquestion,wesumallfrequencies fromthesmallestlittersizeupthroughasizeoffive.Wearriveatanswersof54and 59,respectively.

InExample5,thephosphorusconcentrationdataarecastintotwocumulative frequencydistributions,onewithcumulationcommencingatthelowendofthemeasurementscaleandonewithcumulationbeingperformedfromthehighvaluestoward thelowvalues.Thechoiceofthedirectionofcumulationisimmaterial,ascanbe demonstrated.Ifonedesiredtocalculatethenumberofphosphorusdeterminations lessthan8.55mg/g,namely27,acumulationstartingatthelowendmightbeused, whereastheknowledgeofthefrequencyofdeterminationsgreaterthan8.55mg/g, namely103,canbereadilyobtainedfromthecumulationcommencingfromthehigh endofthescale.Butonecaneasilycalculateanyfrequencyfromalow-to-highcumulation(e.g.,27)fromitscomplementaryfrequencyfromahigh-to-lowcumulation (e.g.,103),simplybyknowingthatthesumofthesetwofrequenciesisthetotalfrequency(i.e., n = 130);therefore,inpracticeitisnotnecessarytocalculatebothsets ofcumulations.

Cumulativefrequencydistributionsareusefulindeterminingmedians,percentiles, andotherquantiles.Theyarenotoftenpresentedinbargraphs,but cumulativefrequencypolygons (sometimescalled ogives)arenotuncommon.(SeeFigures9and10.)

FIGURE9: CumulativefrequencypolygonoftheleafphosphorusdataofExample5,withcumulation commencingfromthelowesttothehighestvaluesofthevariable.

FIGURE10: CumulativefrequencypolygonoftheleafphosphorusdataofExample5,withcumulation commencingfromthehighesttothelowestvaluesofthevariable.

Relativefrequencies(proportionsofthetotalfrequency)canbeplottedinsteadof (or,asinFigures9and10,inadditionto)frequenciesontheverticalaxisofacumulativefrequencypolygon.Thisenablesdifferentdistributionstobereadilycompared andevenplottedonthesameaxes.(UsingthedataofExample5forFigure9,the relativecumulativefrequencyfor8.2mg/gis2/130 = 0.015,thatfor8.3mg/gis 8/130 = 0 062,andsoon.ForFigure10,therelativecumulativefrequencyfor8.2 mg/gis130/130 = 1 000,thatfor8.3mg/gis128/130 = 0 985,andsoon.)

This page intentionally left blank

PopulationsandSamples

FromChapter2of BiostatisticalAnalysis,FifthEdition,JerroldH.Zar.Copyright c 2010by PearsonEducation,Inc.PublishingasPearsonPrenticeHall.Allrightsreserved.

PopulationsandSamples

1POPULATIONS

2SAMPLESFROMPOPULATIONS

3RANDOMSAMPLING

4PARAMETERSANDSTATISTICS

5OUTLIERS

1POPULATIONS

Theprimaryobjectiveofastatisticalanalysisistoinfercharacteristicsofagroupof databyanalyzingthecharacteristicsofasmallsamplingofthegroup.Thisgeneralizationfromtheparttothewholerequirestheconsiderationofsuchimportantconcepts aspopulation,sample,parameter,statistic,andrandomsampling.Thesetopicsare discussedinthischapter.

Basictostatisticalanalysisisthedesiretodrawconclusionsaboutagroupofmeasurementsofavariablebeingstudied.Biologistsoftenspeakofa“population”asa definedgroupofhumansorofanotherspeciesoforganisms.Statisticiansspeakof a population (alsocalleda universe)asagroupofmeasurements(notorganisms) aboutwhichonewishestodrawconclusions.Itisthelatterdefinition,thestatistical definitionof population,thatwillbeusedthroughoutthistext.Forexample,aninvestigatormaydesiretodrawconclusionsaboutthetaillengthsofbobcatsinMontana. AllMontanabobcattaillengthsare,therefore,thepopulationunderconsideration. Ifastudyisconcernedwiththeblood-glucoseconcentrationinthree-year-oldchildren,thentheblood-glucoselevelsinallchildrenofthatagearethepopulationof interest.

Populationsareoftenverylarge,suchasthebodyweightsofallgrasshoppersin KansasortheeyecolorsofallfemaleNewZealanders,butoccasionallypopulations ofinterestmayberelativelysmall,suchastheagesofmenwhohavetraveledtothe moonortheheightsofwomenwhohaveswumtheEnglishChannel.

2SAMPLESFROMPOPULATIONS

Ifthepopulationunderstudyisverysmall,itmightbepracticaltoobtainallthe measurementsinthepopulation.Ifonewishestodrawconclusionsabouttheages ofallmenwhohavetraveledtothemoon,itwouldnotbeunreasonabletoattempt tocollectalltheagesofthesmallnumberofindividualsunderconsideration.Generally,however,populationsofinterestaresolargethatobtainingallthemeasurementsisunfeasible.Forexample,wecouldnotreasonablyexpecttodeterminethe bodyweightofeverygrasshopperinKansas.Whatcanbedoneinsuchcasesis toobtainasubsetofallthemeasurementsinthepopulation.Thissubsetofmeasurementsconstitutesa sample,andfromthecharacteristicsofsampleswecan

PopulationsandSamples

drawconclusionsaboutthecharacteristicsofthepopulationsfromwhichthesamples came.∗

Biologistsmaysampleapopulationthatdoesnotphysicallyexist.Supposean experimentisperformedinwhichafoodsupplementisadministeredto40guinea pigs,andthesampledataconsistofthegrowthratesofthese40animals.Thenthe populationaboutwhichconclusionsmightbedrawnisthegrowthratesofallthe guineapigsthatconceivablymighthavebeenadministeredthesamefoodsupplementunderidenticalconditions.Suchapopulationissaidtobe“imaginary”andis alsoreferredtoas“hypothetical”or“potential.”

3RANDOMSAMPLING

Samplesfrompopulationscanbeobtainedinanumberofways;however,forasampletoberepresentativeofthepopulationfromwhichitcame,andtoreachvalidconclusionsaboutpopulationsbyinductionfromsamples,statisticalprocedurestypically assumethatthesamplesareobtainedina random fashion.Tosampleapopulation randomlyrequiresthateachmemberofthepopulationhasanequalandindependent chanceofbeingselected.Thatis,notonlymusteachmeasurementinthepopulation haveanequalchanceofbeingchosenasamemberofthesample,buttheselection ofanymemberofthepopulationmustinnowayinfluencetheselectionofanyother member.Throughoutthistext,“sample”willalwaysimply“randomsample.”†

Itissometimespossibletoassigneachmemberofapopulationauniquenumber andtodrawasamplebychoosingasetofsuchnumbersatrandom.Thisisequivalent tohavingallmembersofapopulationinahatanddrawingasamplefromthemwhile blindfolded.Table41from Appendix:StatisticalTablesandGraphs provides10,000 randomdigitsforthispurpose.Inthistable,eachdigitfrom0to9hasanequaland independentchanceofappearinganywhereinthetable.Similarly,eachcombination oftwodigits,from00to99,isfoundatrandominthetable,asiseachthree-digit combination,from000to999,andsoon.

Assumethatarandomsampleof200namesisdesiredfromatelephonedirectory having274pages,threecolumnsofnamesperpage,and98namespercolumn.EnteringTable41from Appendix:StatisticalTablesandGraphs atrandom(i.e.,donot alwaysenterthetableatthesameplace),onemightdecidefirsttoarriveatarandom combinationofthreedigits.Ifthisthree-digitnumberis001to274,itcanbetaken asarandomlychosenpagenumber(ifitis000orlargerthan274,simplyskipitand chooseanotherthree-digitnumber,e.g.,thenextoneonthetable).Thenonemight examinethenextdigitinthetable;ifitisa1,2,or3,letitdenoteapagecolumn(ifa digitotherthan1,2,or3isencountered,itisignored,passingtothenextdigitthatis 1,2,or3).Thenonecouldlookatthenexttwo-digitnumberinthetable;ifitisfrom 01to98,letitrepresentarandomlyselectednamewithinthatcolumn.Thisthreestepprocedurewouldbeperformedatotalof200timestoobtainthedesiredrandom sample.Onecanproceedinanydirectionintherandomnumbertable:lefttoright, righttoleft,upward,downward,ordiagonally;butthedirectionshouldbedecided onbeforelookingatthetable.Computersarecapableofquicklygeneratingrandom numbers(sometimescalled“pseudorandom”numbersbecausethenumbergenerationisnotperfectlyrandom),andthisishowTable41from Appendix:Statistical TablesandGraphs wasderived.

∗ Thisuseoftheterms population and sample wasestablishedbyKarlPearson(1903).

† ThisconceptofrandomsamplingwasestablishedbyKarlPearsonbetween1897and1903 (Miller,2004a).

PopulationsandSamples

Veryoftenitisnotpossibletoassignanumbertoeachmemberofapopulation,andrandomsamplingtheninvolvesbiological,ratherthansimplymathematical,considerations.Thatis,thetechniquesforsamplingMontanabobcatsorKansas grasshoppersrequireknowledgeabouttheparticularorganismtoensurethatthe samplingisrandom.Researchersconsultrelevantbooks,periodicalarticles,orreports thataddressthespecifickindofbiologicalmeasurementtobeobtained.

4PARAMETERSANDSTATISTICS

Severalmeasureshelptodescribeorcharacterizeapopulation.Forexample,generallyapreponderanceofmeasurementsoccurssomewherearoundthemiddleofthe rangeofapopulationofmeasurements.Thus,someindicationofapopulation“average”wouldexpressausefulbitofdescriptiveinformation.Suchinformationiscalled a measureofcentraltendency (alsocalleda measureoflocation).

Itisalsoimportanttodescribehowdispersedthemeasurementsarearoundthe “average.”Thatis,wecanaskwhetherthereisawidespreadofvaluesinthepopulationorwhetherthevaluesareratherconcentratedaroundthemiddle.Suchadescriptivepropertyiscalleda measureofvariability (ora measureofdispersion).

Aquantitysuchasameasureofcentraltendencyorameasureofdispersionis calleda parameter whenitdescribesorcharacterizesapopulation,andweshallbe veryinterestedindiscussingparametersanddrawingconclusionsaboutthem. Section2pointedout,however,thatoneseldomhasdataforentirepopulations,but nearlyalwayshastorelyonsamplestoarriveatconclusionsaboutpopulations.Thus, onerarelyisabletocalculateparameters.However,byrandomsamplingofpopulations,parameterscanbeestimatedwell.Anestimateofapopulationparameteris calleda statistic ∗ Itisstatisticalconventiontorepresentpopulationparametersby GreeklettersandsamplestatisticsbyLatinletters;willdemonstratethiscustomfor specificexamples.

Thestatisticsonecalculateswillvaryfromsampletosampleforsamplestaken fromthesamepopulation.Becauseoneusessamplestatisticsasestimatesofpopulationparameters,itbehoovestheresearchertoarriveatthe“best”estimatespossible. Asforwhatpropertiestodesireina“good”estimate,considerthefollowing.

First,itisdesirablethatifwetakeanindefinitelylargenumberofsamplesfroma population,thelong-runaverageofthestatisticsobtainedwillequaltheparameter beingestimated.Thatis,forsomesamplesastatisticmayunderestimatetheparameterofinterest,andforothersitmayoverestimatethatparameter;butinthelongrun theestimatesthataretoolowandthosethataretoohighwill“averageout.”Ifsuch apropertyisexhibitedbyastatistic,wesaythatwehavean unbiased statisticoran unbiasedestimator.

Second,itisdesirablethatastatisticobtainedfromanysinglesamplefromapopulationbeveryclosetothevalueoftheparameterbeingestimated.Thispropertyof astatisticisreferredtoas precision, † efficiency,or reliability.Aswecommonlysecure onlyonesamplefromapopulation,itisimportanttoarriveatacloseestimateofa parameterfromasinglesample.

∗ Thisuseoftheterms parameter and statistic wasdefinedbyR.A.Fisherasearlyas1922(Miller, 2004a;Savage,1976).

† Theprecisionofasamplestatistic,asdefinedhere,shouldnotbeconfusedwiththeprecision ofameasurement.

5OUTLIERS

PopulationsandSamples

Third,considerthatonecantakelargerandlargersamplesfromapopulation(the largestsamplebeingtheentirepopulation).Asthesamplesizeincreases,a consistent statisticwillbecomeabetterestimateoftheparameteritisestimating.Indeed,ifthe samplewerethesizeofthepopulation,thenthebestestimatewouldbeobtained:the parameteritself.

Occasionally,asetofdatawillhaveoneormoreobservationsthataresodifferent, relativetotheotherdatainthesample,thatwedoubttheyshouldbepartofthe sample.Forexample,supposearesearchercollectedasampleconsistingofthebody weightsofnineteen20-week-oldmallardducksraisedinindividuallaboratorycages, forwhichthefollowing19datawererecorded:

1.87,3.75,3.79,3.82,3.85,3.87,3.90,3.94,3.96,3.99, 3.99,4.00,4.03,4.04,4.05,4.06,4.09,8.97,and39.8kilograms.

Visualinspectionofthese19recordeddatacastsdoubtuponthesmallestdatum (1.87kg)andthetwolargestdata(8.97kgand39.8kg)becausetheydiffersogreatly fromtherestoftheweightsinthesample.Datainstrikingdisagreementwithnearly alltheotherdatainasampleareoftencalled outliers or discordantdata,andthe occurrenceofsuchobservationsgenerallycallsforcloserexamination.

Sometimesitisclearthatanoutlieristheresultofincorrectrecordingofdata.In theprecedingexample,amallardduckweightof39.8kgishighlyunlikely(tosaythe least!),forthatisabouttheweightofa12-year-oldboyorgirl(andsuchaduckwould probablynotfitinoneofthelaboratorycages).Inthiscase,inspectionofthedata recordsmightleadustoconcludethatthisbodyweightwasrecordedwithacareless placementofthedecimalpointandshouldhavebeen3.98kginsteadof39.8kg.And, uponinterrogation,theresearchassistantmayadmittoweighingtheeighteenthduck withthescalesettopoundsinsteadofkilograms,sothemetricweightofthatanimal shouldhavebeenrecordedas4.07(not8.97)kg.

Also,uponfurtherexaminationofthedata-collectionprocess,wemayfindthat the1.87-kgduckwastakenfromawrongcageandwas,infact,only4weeksold,not 20weeksold,andthereforedidnotbelonginthissample.Or,perhapswefindthatit wasnotamallardduck,butsomeotherbirdspecies(and,therefore,didnotbelongin thissample).Statisticianssayasampleis contaminated ifitcontainsadatumthatdoes notconformtothecharacteristicsofthepopulationbeingsampled.Sotheweightofa 4-week-oldduck,orofabirdofadifferentspecies,wouldbeastatisticalcontaminant andshouldbedeletedfromthissample.

Therearealsoinstanceswhereitisknownthatameasurementwasfaulty—for example,whenalaboratorytechnicianspillscoffeeontoanelectronicmeasuring deviceorintoabloodsampletobeanalyzed.Insuchacase,themeasurementsknown tobeerroneousshouldbeeliminatedfromthesample.

However,outlyingdatacanalsobecorrectobservationstakenfromanintended population,collectedpurelybychance.Asweshallsee,whendrawingarandomsamplefromapopulation,itisrelativelylikelythatadatuminthesamplewillbearound theaverageofthepopulationandveryunlikelythatasampledatumwillbedramaticallyfarfromtheaverage.Butsampledataveryfarfromtheaveragestillmaybe possible.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.