Using Statistics in the Social and Health Sciences with SPSS Excel 1st… Visit to download the full and correct content document: https://ebookmass.com/product/using-statistics-in-the-social-and-health-sciences-with -spss-excel-1st/
More products digital (pdf, epub, mobi) instant download maybe you interests ...
Using Basic Statistics in the Behavioral and Social Sciences
https://ebookmass.com/product/using-basic-statistics-in-thebehavioral-and-social-sciences/
Practical Statistics for Nursing Using SPSS 1st Edition, (Ebook PDF)
https://ebookmass.com/product/practical-statistics-for-nursingusing-spss-1st-edition-ebook-pdf/
Introductory Statistics Using SPSS 2nd Edition, (Ebook PDF)
https://ebookmass.com/product/introductory-statistics-usingspss-2nd-edition-ebook-pdf/
eTextbook 978-9351500827 Discovering Statistics Using IBM SPSS Statistics, 4th Edition
https://ebookmass.com/product/etextbook-978-9351500827discovering-statistics-using-ibm-spss-statistics-4th-edition/
Statistics for Ecologists Using R and Excel: Data Collection, Exploration,
https://ebookmass.com/product/statistics-for-ecologists-using-rand-excel-data-collection-exploration/
Discovering Statistics Using IBM SPSS Statistics: North American Edition 5th Edition, (Ebook PDF)
https://ebookmass.com/product/discovering-statistics-using-ibmspss-statistics-north-american-edition-5th-edition-ebook-pdf/
Statistics Using IBM SPSS: An Integrative Approach –Ebook PDF Version
https://ebookmass.com/product/statistics-using-ibm-spss-anintegrative-approach-ebook-pdf-version/
eTextbook 978-0134173054 Statistics for Managers Using Microsoft Excel
https://ebookmass.com/product/etextbook-978-0134173054statistics-for-managers-using-microsoft-excel/
Applied Univariate, Bivariate, and Multivariate Statistics: Understanding Statistics for Social and Natural Scientists, With Applications in SPSS and R 2nd Edition Daniel J. Denis
https://ebookmass.com/product/applied-univariate-bivariate-andmultivariate-statistics-understanding-statistics-for-social-andnatural-scientists-with-applications-in-spss-and-r-2nd-edition-
USINGSTATISTICSIN THESOCIALAND HEALTHSCIENCES
WITHSPSS® AND EXCEL®
BetweenandWithinResearchDesigns,210
UsingDifferent T Tests,211
Independent T Test:TheProcedure,213
CreatingtheSamplingDistributionofDifferences,215
TheNatureoftheSamplingDistributionofDifferences,216
CalculatingtheEstimatedStandardErrorofDifferencewithEqualSample Size,218
UsingUnequalSampleSizes,219
TheIndependent T Ratio,221
Independent T TestExample,222
HypothesisTestElementsfortheExample,222
Before–AfterConventionwiththeIndependent T Test,226
ConfidenceIntervalsfortheIndependent T Test,227
EffectSize,228
TheAssumptionsfortheIndependent T Test,230
SPSS® ExploreforCheckingtheNormalDistributionAssumption, 231
ExcelProceduresforCheckingtheEqualVarianceAssumption,233
SPSS® ProcedureforCheckingtheEqualVarianceAssumption,237 UsingSPSS® andExcelwiththeIndependent T Test,239
SPSS® ProceduresfortheIndependent T Test,239
ExcelProceduresfortheIndependent T Test,243
EffectSizefortheIndependent T TestExample,245 PartingComments,245
NonparametricStatistics:TheMann–Whitney U Test,246
TermsandConcepts,249
DataLabandExamples(withSolutions),249
DataLab:Solutions,251
GraphicsintheDataSummary,254
9ANALYSISOFVARIANCE255
AHypotheticalExampleofANOVA,255
TheNatureofANOVA,257
TheComponentsofVariance,258
TheProcessofANOVA,259
CalculatingANOVA,260
EffectSize,268
PostHocAnalyses,269
AssumptionsofANOVA,274
AdditionalConsiderationswithANOVA,275
TheHypothesisTest:InterpretingANOVAResults,276
AretheAssumptionsMet?,276
UsingSPSS® andExcelwithOne-WayANOVA,282
TheNeedforDiagnostics,289
Non-ParametricANOVATests:TheKruskal–WallisTest,289 TermsandConcepts,292
DataLabandExamples(withSolutions),293
DataLab:Solutions,294
10FACTORIALANOVA297
ExtensionsofANOVA,297
ANCOVA,298
MANOVA,299
MANCOVA,299
FactorialANOVA,299
InteractionEffects,299
SimpleEffects,301
2XANOVA:AnExample,302
CalculatingFactorialANOVA,303
TheHypothesesTest:InterpretingFactorialANOVAResults,306
EffectSizefor2XANOVA:Partial �� 2 ,308
DiscussingtheResults,309 UsingSPSS® toAnalyze2XANOVA,311
SummaryChartfor2XANOVAProcedures,319 TermsandConcepts,319
DataLabandExamples(withSolutions),320
DataLab:Solutions,320
11CORRELATION329
TheNatureofCorrelation,330
TheCorrelationDesign,331
Pearson’sCorrelationCoefficient,332
PlottingtheCorrelation:TheScattergram,334 UsingSPSS® toCreateScattergrams,337 UsingExceltoCreateScattergrams,339
CalculatingPearson’s r ,341
The Z ScoreMethod,342
TheComputationMethod,344
TheHypothesisTestforPearson’s r ,345
EffectSize:theCoefficientofDetermination,347
Diagnostics:CorrelationProblems,349
CorrelationUsingSPSS® andExcel,352
NonparametricStatistics:Spearman’sRankOrderCorrelation(rs ),358 TermsandConcepts,363
DataLabandExamples(withSolutions),364 DataLab:Solutions,365
12BIVARIATEREGRESSION371
TheNatureofRegression,372
TheRegressionLine,374
CalculatingRegression,376
EffectSizeofRegression,379
The Z ScoreFormulaforRegression,380
TestingtheRegressionHypotheses,382
TheStandardErrorofEstimate,383
ConfidenceInterval,385
ExplainingVarianceThroughRegression,386
ANumericalExampleofPartitioningtheVariation,389
UsingExcelandSPSS® withBivariateRegression,390 TheSPSS® RegressionOutput,390
TheExcelRegressionOutput,396
CompleteExampleofBivariateLinearRegression,398
AssumptionsofBivariateRegression,398
TheOmnibusTestResults,404
EffectSize,404
TheModelSummary,405
TheRegressionEquationandIndividualPredictorTestofSignificance,405 AdvancedRegressionProcedures,406
DetectingProblemsinBivariateLinearRegression,408 TermsandConcepts,409
DataLabandExamples(withSolutions),410
DataLab:Solutions,411
13INTRODUCTIONTOMULTIPLELINEARREGRESSION417
TheElementsofMultipleLinearRegression,417
SameProcessasBivariateRegression,418
SomeDifferencesbetweenBivariateLinearRegressionandMultipleLinear Regression,419
StuffnotCovered,420
AssumptionsofMultipleLinearRegression,421
AnalyzingResidualstoCheckMLRAssumptions,422 DiagnosticsforMLR:CleaningandCheckingData,423 ExtremeScores,424
DistanceStatistics,428
InfluenceStatistics,429
MLRExtendedExampleData,430
AssumptionsMet?,431
AnalyzingResiduals:AreAssumptionsMet?,433 InterpretingtheSPSS® FindingsforMLR,436
EnteringPredictorsTogetherasaBlock,437 EnteringPredictorsSeparately,442
AdditionalEntryMethodsforMLRAnalyses,447
ExampleStudyConclusion,448
TermsandConcepts,448
DataLabandExample(withSolution),450
DataLab:Solution,450
14CHI-SQUAREANDCONTINGENCYTABLEANALYSIS455
ContingencyTables,455
TheChi-squareProcedureandResearchDesign,456
Chi-squareDesignOne:GoodnessofFit,457
AHypotheticalExample:GoodnessofFit,458
EffectSize:GoodnessofFit,462
Chi-squareDesignTwo:TheTestofIndependence,463
AHypotheticalExample:TestofIndependence,464
Special2 × 2Chi-square,468
EffectSizein2 × 2Tables:PHI,470
Cramer’s V :EffectSizefortheChi-squareTestofIndependence,471
RepeatedMeasuresChi-square:McnemarTest,472 UsingSPSS® andExcelwithChi-square,474 UsingSPSS® fortheChi-squareTestofIndependence,475 UsingExcelforChi-squareAnalyses,481
TermsandConcepts,483
DataLabandExamples(withSolutions),483
DataLab:Solutions,484
15REPEATEDMEASURESPROCEDURES: T dep ANDANOVAWS 489
IndependentandDependentSamplesinResearchDesigns,490 UsingDifferent T Tests,491
TheDependent T TestCalculation:The“Long”Formula,491 Example:TheLongFormula,492
TheDependent T TestCalculation:The“Difference”Formula,494 T dep andPower,496
ConductingThe T dep AnalysisUsingSPSS®,496 ConductingThe T dep AnalysisUsingExcel,498
Within-SubjectANOVA(ANOVAWS ),498
ExperimentalDesigns,499
PostFactoDesigns,500
Within-SubjectExample,501 UsingSPSS® forWithin-SubjectData,501
TheSPSS® Procedure,502
TheSPSS® Output,504
NonparametricStatistics,508
TermsandConcepts,508
APPENDICES
AppendixASPSS® BASICS509 UsingSPSS® ,509 GeneralFeatures,510 ManagementFunctions,513 AdditionalManagementFunctions,517
AppendixBEXCELBASICS531 DataManagement,531 TheExcelMenus,533 UsingStatisticalFunctions,541 DataAnalysisProcedures,543 MissingValuesand“0”ValuesinExcelAnalyses,544 UsingExcelwith“RealData”,544
AppendixCSTATISTICALTABLES545
TableC.1: Z -ScoreTable(ValuesShownarePercentages–%),545
TableC.2:ExclusionValuesforthe T -Distribution,547
TableC.3:Critical(Exclusion)ValuesfortheDistributionof F ,548
TableC.4:Tukey’sRangeTest(Upper5%Points),551
TableC.5:Critical(Exclusion)ValuesforPearson’sCorrelation Coefficient, r ,552
TableC.6:CriticalValuesofthe �� 2 (Chi-Square)Distribution,553
REFERENCES555 Index557
PREFACE Thestudyofstatisticsisgainingrecognitioninagreatmanyfields.Inparticular, researchersinthesocialandhealthsciencesnoteitsimportanceforproblemsolving anditspracticalimportanceintheirareas.Statisticshasalwaysbeenimportant,for example,amongthosehopingtoentercareersinmedicinebutmoresonowdueto theincreasingemphasison“ScientificInquiry&ReasoningSkills”aspreparationfor theMedicalCollegeAdmissionTest(MCAT).Sociology,alwaysrelyingonstatistics andresearchforitscoreemphases,isnowincludedintheMCATaswell.
Thisbookfocusessquarelyontheproceduresimportanttoanessentialunderstandingofstatisticsandhowitisusedintherealworldforproblemsolving.Moreover,my discussioninthebookrepeatedlytiesstatisticalmethodologywithresearchdesign (seethe“companion”volumemycolleagueandIwrotetoemphasizeresearchand designskillsinsocialscience;AbbottandMcKinney,2013).
Iemphasizeappliedstatisticalanalysesandassuchwilluseexamplesthroughoutthebookdrawnfrommyownresearchaswellasfromnationaldatabaseslike GSSandBehavioralRiskFactorSurveillanceSystem(BRFSS).Usingdatafrom thesesourcesallowstudentstheopportunitytoseehowstatisticalproceduresapply toresearchintheirfieldsaswellastoexamine“realdata.”Acentralfeatureofthe bookismydiscussionanduseofSPSS® andMicrosoftExcel® toanalyzedatafor problemsolving.
Throughoutmyteachingandresearchcareer,Ihavedevelopedanapproachto helpingstudentsunderstanddifficultstatisticalconceptsinanewway.Ifindthatthe greatmajorityofstudentsarevisuallearners,soIdevelopeddiagramsandfigures overtheyearsthathelpcreateaconceptualpictureofthestatisticalproceduresthat areoftenproblematictostudents(likesamplingdistributions!).
Anotherreasonforwritingthisbookwastogivestudentsawaytounderstandstatisticalcomputingwithouthavingtorelyoncomprehensiveandexpensivestatistical softwareprograms.SincemoststudentshaveaccesstoMicrosoftExcel,Idevelopeda step-by-stepapproachtousingthepowerfulstatisticalproceduresinExceltoanalyze dataandconductresearchineachofthestatisticaltopicsIcoverinthebook.1
Ialsowantedtomakethosecomprehensivestatisticalprogramsmoreapproachabletostatisticsstudents,soIhavealsoincludeda“hands-on”guidetoSPSSin parallelwiththeExcelexamples.Insomecases,SPSShastheonlymeanstoperform somestatisticalprocedures,butinmostcases,bothExcelandSPSScanbeused.
Herearesomeofthefeaturesofthebook:
1.Emphasisonthe interpretationoffindings.
2.Useof clearexamplesfrommyexistingandformerresearchprojectsandlarge databasestoillustratestatisticalprocedures.“Real-world”datacanbecumbersome,soIintroducestraightforwardproceduresandexamplesinordertohelp studentsfocusmoreoninterpretationoffindings.
3.Inclusionofa datalabsectionineachchapterthatprovidesrelevant,clear examples.
4. Introductiontoadvancedstatisticalproceduresinchaptersections(e.g., regressiondiagnostics)andseparatechapters(e.g.,multiplelinearregression) forgreaterrelevancetoreal-worldresearchneeds.
5.Strengtheningofthe connectionbetweenstatisticalapplicationandresearch designs.
6.Inclusionofdetailedsectionsineachchapterexplaining applicationsfrom ExcelandSPSS.
IuseSPSS2 (versions22and23)screenshotsofmenusandtablesbypermission fromtheIBM® Company.IBM,theIBMlogo,ibm.com,andSPSSaretrademarks orregisteredtrademarksof InternationalBusinessMachinesCorporation,registeredinmanyjurisdictionsworldwide.Otherproductandservicenamesmightbe trademarksofIBMorothercompanies.AcurrentlistofIBMtrademarksisavailable ontheWebat“IBMCopyrightandtrademarkinformation”atwww.ibm.com/legal/
1 OnelimitationtoteachingstatisticsprocedureswithExcelisthatthedataanalysisfeaturesaredifferent dependingonwhethertheuserisa“Mac”userora“PC”user.IamusingthePCversion,whichfeatures a“DataAnalysis”suiteofstatisticaltools.ThisfeaturemaynolongerbeincludedintheMacversionof Excel.
2 SPSSscreenreprintsthroughoutthebookareusedcourtesyofInternationalBusinessMachinesCorporation,©InternationalBusinessMachinesCorporation.SPSSwasacquiredbyIBMinOctober2009.
PREFACE xvii
copytrade.shtml.MicrosoftExcelreferencesandscreenshotsinthisbookareused withpermissionfromMicrosoft.IuseMicrosoftExcel® 2013inthisbook.3
IuseGSS(2014)dataandcodebookforexamplesinthisbook.4 TheBRFSS SurveyQuestionnaireandDataareusedwithpermissionfromtheCDC.5
3 ExcelreferencesandscreenshotsinthisbookareusedwithpermissionfromMicrosoft®
4 Smith,TomW.,PeterMarsden,MichaelHout,andJibumKim.GeneralSocialSurveys,1972–2012 [machine-readabledatafile]/PrincipalInvestigator,TomW.Smith;CoprincipalInvestigator,PeterV.Marsden;CoprincipalInvestigator,MichaelHout;SponsoredbyNationalScienceFoundation.NORCed. Chicago:NationalOpinionResearchCenter[producer];Storrs,CT:TheRoperCenterforPublicOpinion Research,UniversityofConnecticut[distributor],2013.1datafile(57,061logicalrecords) + 1codebook (3432pp.).(NationalDataProgramfortheSocialSciences,No.21).
5 CentersforDiseaseControlandPrevention(CDC). BehavioralRiskFactorSurveillanceSystemSurveyQuestionnaire.Atlanta,Georgia:U.S.DepartmentofHealthandHumanServices,CentersforDisease ControlandPrevention,2013andCentersforDiseaseControlandPrevention(CDC). BehavioralRiskFactorSurveillanceSystemSurveyData.Atlanta,Georgia:U.S.DepartmentofHealthandHumanServices, CentersforDiseaseControlandPrevention,2013.
ACKNOWLEDGMENTS IwishtothankmydaughterKristinHovaguimianforheroutstandingworkonthe Indextothisbook(andalltheothers!)–notaneasytaskwithabookofthisnature.
IthankmywifeKathleenAbbottforherdedicationandamazingcontributionsto theeditingprocess.
IthankmysonMatthewAbbottfortheinspirationhehasalwaysprovidedin mattersstatisticalandphilosophical.
ThankyouJonGurstelleandtheteamatWileyforyourcontinuingsupportofthis project.
1 INTRODUCTION Theworldsuddenlyhasbecomeawashindata!Agreatmanypopularbookshave beenwrittenrecentlythatextol“bigdata”andtheinformationderivedfordecision makers.Thesedataareconsidered“big”becauseacertain“catalog”ofdatamaybeso largethattraditionalwaysofmanagingandanalyzingsuchinformationcannoteasily accommodateit.Thedataoriginatefromyouandmewheneverweusecertainsocial media,ormakepurchasesonline,orhaveinformationderivedfromusthroughradio frequencyidentification(RFID)readersattachedtoclothingandcars,evenimplanted inanimals,andsoon.Theresultisamassiveavalancheofinformationthatexists forbusinessesleaders,decisionmakers,andresearcherstouseforpredictingrelated behaviorsandattitudes.
BIGDATAANALYSIS Decisionmakersaretryingtofigureouthowtomanageandusetheinformation available.Typicalcomputersoftwareusedforstatisticaldecisionmakingiscurrently limitedtoanumberofcasesfarbelowthatwhichisavailableforconsiderationofbig data.Atraditionalapproachtoaddressthisissueisknownas“datamining”inwhich anumberoftechniques,includingstatistics,areusedtodiscoverpatternsinalarge setofdata.
Researchersmaybeoverjoyedwiththeavailabilityofsuchrichdata,butitprovidesbothopportunitiesandchallenges.Ontheopportunityside,neverbeforehave
UsingStatisticsintheSocialandHealthScienceswithSPSS® andExcel®,FirstEdition. MartinLeeAbbott. ©2017JohnWiley&Sons,Inc.Published2017byJohnWiley&Sons,Inc.
suchlargeamountsofinformationbeenavailabletoassistresearchersandpolicy makersunderstandwidespreadpublicthinkingandbehavior.Onthechallengeside howeverareseveraldifficultquestions:
• Howaresuchdatatobeexamined?
• Docurrentsocialsciencemethodsandprocessesprovideguidancetoexamining datasetsthatsurpasshistoricaldata-gatheringcapacity?
• Arebigdatarepresentative?
• Dodatasetssolargeobviatetheneedforprobability-basedresearchanalyses?
• Dodecisionmakersunderstandhowtousesocialsciencemethodologytoassist intheiranalysesofemergingdata?
• Willthedecisionsemergingfrombigdatabeusedethically,withinthecontext tosocialscienceresearchguidelines?
• Willeffectsizeconsiderationsovershadowquestionsofsignificancetesting?
Socialscientistscanrelyonexistingstatisticalmethodstomanageandanalyze bigdata,butthe wayinwhichtheanalysesareusedfordecisionmakingwillchange. Onetrendisthatpredictionmaybehailedasamoreprominentmethodforunderstandingthedatathantraditionalhypothesistesting.Wewillhavemoretosayabout thisdistinctionlaterinthebook,butitisimportantatthispointtoseethatresearchers willneedtoadaptstatisticalapproachesforanalyzingbigdata.
VISUALDATAANALYSIS Anotheremergingtrendforunderstandingandmanagingtheswellofdataistheuse ofvisuals.Ofcourse,visualdescriptionsofdatahavebeenusedforcenturies.Itis commonlyacknowledgedthatthefirst“piechart”waspublishedbyPlayfair(1801). Playfair’sexampleinFigure1.1comparesthedynamicsofnationsovertime.
Figure1.1comparednationsusingsize,color,andorientationovertime.Using thismethodforcomparinginformationhasbeenusefulforviewingthepatternsin datanotreadilyobservablefromnumericalanalysis.
Aswithnumericalmethods,however,thereareopportunitiesandchallengesin theuseofvisualanalyses:
• Canvisualmeansbeusedtoconveycomplexmeaning?
• Arethere“rules”thatwillhelptoinsureastandardwayofcreating,analyzing, andinterpretingsuchvisualinformation?
• Willvisualanalysesbecomedivorcedfromnumericalanalysissothatobservers havenowayofobjectivelyconfirmingthemeaningoftheimages?
Severalvisualdatasoftwareanalysisprogramshaveappearedoverthelastseveral years.Simplyrunninganonlinesearchwillyieldseveralpossibilitiesincludingmany thatofferfree(initial)programsforcatalogingandpresentingdatafromtheuser.I offeroneveryimportantcaveat(seethefinalbulletpointearlier),whichisthatitis
Figure1.1 WilliamPlayfair’spiechart. Source:https://commons.wikimedia.org/wiki/File :Playfair_piecharts.jpg.Publicdomain.
importanttoperformvisualdataanalysisinconcertwithnumericalanalysis.Aswe willseelaterinthebook,itiseasytointentionallyorunintentionallymisleadreaders usingvisualpresentationswhenthesearedivorcedfromnumericalstatisticalmeans thatdiscussthe“significance”and“meaningfulness”ofthevisualdata.
IMPORTANCEOFSTATISTICSFORTHESOCIALANDHEALTH SCIENCESANDMEDICINE Thepresenceofsomuchrichinformationpresentsmeaningfulopportunitiesfor understandingmanyoftheprocessesthataffectthesocialworld.Whilemuchof thetimebigdataanalysesareusedforunderstandingbusinessdynamicsandeconomictrends,itisalsoimportanttofocusonthosedatapatternsthatcanaffectthe socialspherebeyondtheseindicators:socialandpsychologicalbehaviorandattitudes,changesinunderstandinghealthandmedicine,andeducationalprogress.These socialindicatorshavebeenthesubjectofagreatdealofanalysesoverthedecades andnowmaymakesignificantadvancesdependingonhowbigdataareanalyzedand managed.Onarelatednote,thesocialsciences(especiallysociologyandpsychology)arenowareasincludedinthenewMedicalCollegeAdmissionTest(MCAT), whichalsoincludesgreateremphasisupon“ScientificInquiry&ReasoningSkills.” Thematerialwewilllearnfromthisbookwillhelptosupportstudyintheseareas foraspiringhealthandmedicalprofessionals.
Inthisbook,Iintendtofocusonhowtouseandanalyzedataofallsizesand shapes.Whilewewillbelimitedinourabilitytodiveintotheworldofbigdatafully, wecanstudythebasicsofhowtorecognize,generate,interpret,andcritiqueanalyses ofdatafordecisionmaking.Oneofthefirstlessonsisthat datacanbeunderstood bothnumericallyandvisually.Whenwedescribeinformation,weareattemptingto
seeandconveyunderlyingmeaninginthenumbersandvisualexpressions.IfIhave acollectionofdata,Icannotrecognizeitsmeaningbysimplylookingatit.However, ifIapplycertainnumericalandvisualmethodsto organizethedata,Icanseewhat patternslaybelowthesurface.
HISTORICALNOTES:EARLYUSEOFSTATISTICS Statisticsasafieldhashadalongandcolorfulhistory.Studentswillrecognizesome prominentnamesasthefielddevelopeditsmathematicalidentity:Pearson,Fisher, Bayes,Laplace,andothers.Butitisimportanttonotethatsomeoftheearlieststatisticalstudieswerebasedinsolvingsocialandpoliticalproblems.
OneoftheearliestofsuchstudieswasdevelopedbyJohnGrauntwhocompiled informationfromBillsofMortalitytodetect,amongotherthings,theimpactand originsofdeathsbyplague.Parishrecordsdocumentedchristenings,weddings,and burialsatthetime,soGraunt’sstudytrackedthenumberofdeathsintheparishesas
Figure1.2 JohnSnow’smapshowingdeathsintheLondoncholeraepidemicof1854. Source:https://commons.wikimedia.org/wiki/File:Snow-cholera-map-1.jpg.Publicdomain.
awaytounderstandthedynamicsoftheplague.Hisbroadergoalwastopredictthe populationofLondonusingextantdatafromtheparishrecords.
AnotherearlyuseofstatisticswasDrJohnSnow’smapshowingdeathsinthe housesofLondon’sSohoDistrictduringthe1854choleraepidemic,aspopularized byJohnson’sbook, TheGhostMap (2006).Inordertoinvestigatethereasonsforthe spreadofcholeraotherthanodor(“miasmatheory”),Snowcreatedamapshowing eachdeathasablacklineoutsideeachhousehold,alongwithfeaturesoftheneighborhoodincludingthewatersourceslocatedthroughoutthedistrict.Themapcreated avisualpictureoftheconcentrationofdeathsacrossthedistrictandledtohypothesesaboutcholeraspreadingbywaterbornecontaminationratherthansmell.(Ifyou weretowalkacrossthesameLondondistricttoday,youwillseethatthegreatsocial theoristKarlMarxlivedjustafewstreetsawayfromthecenterofthecholeradeaths.)
Figure1.2showsSnow’smap.Youcanseethatnearthecenterofthemapis the“BroadStreetPump”whichSnowdeterminedtobethesourceforthespreadof cholera.(Atthetime,KarlMarxlivedonDeanStreet,justtotheeastoftheBroad StreetPump.)Noticethatthehousesnearestthispumprecordedthehighestnumbers ofdeaths.
Figure1.2examplenotonlyshowshowdescriptivestatisticsunderscoredtheuse ofvisualmeansofrepresentingdata,butitalsohelpedtoclarifypossiblereasons foranepidemic.Graunt’stablesbasedontheBillsofMortalitywererudimentary visuals,butSnow’smapwasamoreeffectivemeansofportrayingcomplexdataby visualmeans.Astilllaterstatisticianmadeevengreateradvancementsinusingvisual informationtocommunicatetrendsindata.
Figure1.3 FlorenceNightingale’spolarchartcomparingbattlefieldandnonbattlefield deaths. Source:https://en.wikipedia.org/wiki/Pie_chart#/media/File:Nightingale-mortality.jpg. Publicdomain.
Nightingale(1858)ismostoftenrememberedasthefounderofmodernnursing. Sheisoftenrepresentedinpaintingsas“theladywiththelamp,”sinceshewas knowntowalkamongthebedsidescheckingonthesickandwoundedofthewar. ButNightingalewasalsoanastutestatisticianwhousedstatisticstocapturethedramaticneedinhospitalsduringtheCrimeanWar.Sheiscreditedasbeingoneofthe firsttousea“piechart”(moreaccurately,a“polarchart”).Figure1.3showscomparisonsinheroriginalpolarchartofdifferencesbetweensoldierswhodiedofbattlefield wounds(“red”wedgesnearthecenter)andthosewhodiedfromothercauses(“blue” wedgesmeasuredfromthecenterofthegraph)overtime.Therelationshipbetween thesegroupsfueledNightingale’seffortstoobtainfurtherfundingforsanitaryhospitalconditionssincethosewhodiedofinfectionsweregreaterinnumberthanthose dyingofbattlefieldwounds.
APPROACHOFTHEBOOK Manystudentsandresearchersareintimidatedbystatisticalprocedures,whichmay beduetofearofmath,problematicmathteachersinearliereducation,orthelackof exposuretoa“discovery”methodforunderstandingdifficultprocedures.Thisbook isanintroductiontounderstandingstatisticsinawaythatallowsstudentstodiscover patternsindataanddevelopingskillatmakinginterpretationsfromdataanalyses.I describehowtousestatisticalprograms(SPSSandExcel)tomakethestudymore understandableandtoteachstudentshowtoapproachproblemsolving.Ordinarily,a firstcourseinstatisticsleadsstudentsthroughtheworldsofdescriptiveandinferential statisticsbyhighlightingtheformulasandsequentialproceduresthatleadtostatistical decisionmaking.Wewilldoallthisinthisbook,butIplaceagooddealmoreattention onconceptualunderstanding.Thus,ratherthanmemorizingaspecificformulaand usingitinaspecificwaytosolveaproblem,Iwanttomakesurethestudentfirst understandsthenatureoftheproblem,whyaspecificformulaisneeded,andhowit willresultintheappropriateinformationfordecisionmaking.
Byusingstatisticalsoftware,wecanplacemoreattentiononunderstandinghowto interpretfindings.Statisticscoursestaughtinmathematicsdepartments,andinsome socialsciencedepartments,oftenplaceprimaryemphasesontheformulas/processes themselves.Intheextreme,thiscanlimittheusefulnessoftheanalysestothepractitioner.Myapproachencouragesstudentstofocusmoreonhowtounderstandand makeapplicationsoftheresultsofstatisticalanalyses.SPSSandotherstatistical programsaremuchmoreefficientatperformingtheanalyses;thekeyissueinmy approachishowtointerprettheresultsinthecontextoftheresearchquestion.
Beginningwithmyfirstundergraduatecourseteachingstatisticswithconventional textbooks,Ihavespentcountlesshoursdemonstratinghowtoconductstatisticaltests manuallyandteachingstudentstodolikewise.Thisisnotalwaysabadstrategy; performingtheanalysismanuallycanleadthestudenttounderstandhowformulas treatdataandyieldvaluableinformation.However,itisoftenthecasethatthestudentgravitatestomemorizingtheformulaorthestepsinananalysis.Again,there isnothingwrongwiththisapproachaslongasthestudentdoesnotstopthere. The
outcomeoftheanalysisismoreimportantthanmemorizingthestepstotheoutcome. Examiningtheappropriateoutputderivedfromstatisticalsoftwareshiftstheattention fromthenuancesofaformulatothewealthofinformationobtainedbyusingit.
ItisimportanttounderstandthatIdoindeedteachthestudentthenuancesof formulas,understandingwhy,when,how,andunderwhatconditionstheyareused. Butinmyexperience,forcingthestudenttoscrutinizestatisticaloutputfilesaccomplishesthisandteachesthemtheappropriateuseandlimitationsoftheinformation derived.
Studentsinmyclassesarealwayssurprised(ecstatic)torealizetheycanusetheir textbooksandnotesonmyexams.Buttheyquicklyfindthat,unlesstheyreally understandtheprinciplesandhowtheyareappliedandinterpreted,anopenbook isnotgoingtohelpthem.Overtime,theycometorealizethattheanalysesandthe outcomesofstatisticalproceduresaresimplytheingredientsforwhatcomesnext: buildingsolutionstoresearchproblems.Therefore,theirroleismoredetectiveand constructorthannumberjuggler.
Thisapproachmirrorstherecentnationalandinternationaldebateaboutmath pedagogy.Inourrecentbook, WinningtheMathWars (2010),mycolleaguesand Iaddressedtheseissuesingreatdetail,suggestingthat,whiletraditionalwaysof teachingmathareusefulandimportant,theemphasesofreformapproachesarenotto bedismissed.Understandingandmemorizingdetailarecrucial,butproblemsolving requiresadifferentapproachtolearning.
CASESFROMCURRENTRESEARCH Ifocusonusingreal-worlddatainthisbook.Thereareseveralreasonsfordoingso, primarilybecausestudentsneedtobegroundedinapproachesforusingdatafromthe realworldwithalltheirproblemsand“grittiness.”Whenpeoplerespondtosurveys orinterviews,theyinevitablyfilloutinformationinwaysnotaskedbyinterviewers(e.g.,respondentsmaychoosetwopossibleanswerswhenoneisrequired,etc.). Moreover,transferringdatatoelectronicformmayresultinmiscodedresponsesor categorizationproblems.Researchersalwaysconfronttheseissues,andIbelieveit isimportantforstudentstoleavetheclassroomawareoftherangeofpossibleproblemswithreal-worlddataandpreparedfordealingwiththem.Ofcourse,muchofthe datawewillexaminewillalreadyhavebeenputinstandardforms,butotherresearch issueswillarise(e.g.,howdoIrecategorizedata,assignmissingcases,computenew variables,etc.?).
AnotherreasonIusereal-worlddataistofamiliarizestudentswithcontemporary researchquestionsinthesocialandhealthsciencefields.Classroomdataoftenare contrivedtomakeacertainpointorshowaspecificprocedure,whicharebothhelpful. ButIbelieveitisimportanttodrawthefocusawayfromtheprocedureperseand understandhowtheprocedurewillhelptheresearcherresolvearesearchquestion. Theresearchquestionsareimportant.Policyreflectstheavailableinformationona researchtopic,tosomeextent,soitisimportantforstudentstobeabletogenerate thatinformationaswellastounderstandit.Thisisan“active”ratherthan“passive” learningapproachtounderstandingstatistics.
DataLabsareaveryimportantpartofthiscoursesincetheyallowstudentsto takechargeoftheirlearning.Thisistheheartofdiscoverylearning.Understandinga statisticalprocedureintheconfinesofaclassroomisnecessaryandhelpful.However, learningthatlastsisbestaccomplishedbystudentsdirectlyengagingtheprocesses withactualdataandobservingwhatpatternsemergeinthefindingsthatcanbeapplied torealresearchproblems.
Somepracticeproblemsmayusedatacreatedforclassroomuse,butreal-world datafromactualresearchdatabaseswillenableadeepeningofunderstanding.Inadditiontonationaldatabases,Iuseresultsfrommyownresearchforclassroomlearning. Ineverycase,researchersknowthattheywilldiscoverknottyproblemsandunusual, sometimesidiosyncratic,informationintheirdata.Ifstudentsarenotexposedtothis real-worldaspectofresearch,itwillbeconfusingwhentheyengageinactualresearch beyondtheconfinesoftheclassroom.
Inthiscourse,wewillhaveseveraloccasionstocompleteDataLabsthatpose researchproblemswithactualdata.StudentstakewhattheylearnfromthebookmaterialandconductastatisticalinvestigationusingSPSSandExcel.Then,theyhavethe opportunitytoexaminetheresults,writeresearchsummaries,andcomparefindings withthesolutionspresentedattheendofthebook.
Theprojectlabsalsointroducestudentstotwosoftwareapproachesforsolvingstatisticalproblems.Thesearequitedifferentinmanyregards,aswewillsee inthechaptersthatfollow.SPSSprovidesadditionaladvancedprocedureseducationalresearchersutilizeformorecomplexandextensiveresearchquestions.Excel iswidelyaccessibleandprovidesawealthofinformationtoresearchersaboutmany statisticalprocessestheyencounterinactualresearch.TheDataLabsprovidesolutionsinbothformatssothestudentcanlearnthecapabilitiesandapproachesofeach.
Thisbookmakesuseofpublicallyavailableresearchdata.TheGeneralSocialSurveyorGSS1 isanationallyrepresentativesurveydesignedtobepartofaprogramof socialresearchtomonitorchangesinAmericans’socialcharacteristicsandattitudes. FundedthroughtheNationalScienceFoundationandadministeredbytheNational OpinionResearchCenter(NORC),theGSShasbeenadministeredannuallyorbiannuallysince1972.Asageneralsurvey,theGSSasksavarietyofquestionsonaseries oftopicsdesignedtotracktheopinionsofAmericansoverthelastfourdecades. Otherdatabaseswewilluseinthebookincludethefollowing:
• TheCentersforDiseaseControlandPrevention(CDC)conductstheBehavioral RiskFactorSurveillanceSystem(BRFSS)asahealth-relatedtelephonesurvey tomeasureAmericanresidents’healthconditions,healthbehaviors,anduseof preventativeservices.2
1 TomW.Smith,PeterMarsden,MichaelHout,andJibumKim.GeneralSocialSurveys,1972–2012 [machine-readabledatafile]/PrincipalInvestigator,TomW.Smith;CoprincipalInvestigator,PeterV. Marsden;CoprincipalInvestigator,MichaelHout;SponsoredbyNationalScienceFoundation.–NORC ed.–Chicago:NationalOpinionResearchCenter[producer];Storrs,CT:TheRoperCenterforPublic OpinionResearch,UniversityofConnecticut[distributor],2013.1datafile(57,061logicalrecords) + 1 codebook(3432pp.).--(NationalDataProgramfortheSocialSciences,No.21).
2 CentersforDiseaseControlandPrevention(CDC)(2013). BehavioralRiskFactorSurveillanceSystemSurveyData.Atlanta,Georgia:U.S.DepartmentofHealthandHumanServices,CentersforDisease ControlandPrevention.
• AssociationofReligionDataArchives(ARDA)presentsaseriesofdatabases onavarietyofreligiontopicsfromthesociologicalperspective.Inadditionto otherdatabases,theARDApresentsGSSdatabasesonspecialmodules(setsof questions)relevanttoreligion.ByvisitingtheARDA(www.thearda.com),you canperusethecodebookforthelatestGSSfile(www.thearda.com/Archive/ GSS.asp)togetafullersenseofthetypesofquestionsageneralsurveyasks. YoucanalsovisittheARDA’s“LearningCenter”totakeasurveythatallows youtocompareyourselftoalargernationalprofile.The“CompareYourself totheNation”surveyallowsyoutoseehowyoucomparetoothersbasedon theresultsfromthe2005BaylorReligionSurvey(addressingreligiousidentity, beliefs,experiences,paranormalviews,etc.).
RESEARCHDESIGN Researcherswhowritestatisticsbookshaveadilemmawithrespecttoresearch design.Typically,statisticsandresearchdesignaretaughtseparatelyinorderfor studentstounderstandeachingreaterdepth.Thedifficultywiththisapproachisthat thestudentisleftontheirowntosynthesizetheinformation;thisisoftennotdone successfully.
Collegesanduniversitiesattempttomanagethisproblemdifferently.Somerequire statisticsasaprerequisiteforaresearchdesigncourseorviceversa.Othersattemptto synthesizetheinformationintoonecourse,whichisdifficulttodogiventheeventual complexityofboth“sets”ofinformation.Addingsomewhattotheproblemisthe approachofmultiplecoursesinbothdomains.
Idonotofferaperfectsolutiontothisdilemma.Myapproachfocusesonan in-depthunderstandingofstatisticalproceduresforactualresearchproblems.What thismeansisthatIcannotdevoteagreatdealofattentioninthisbooktoresearch designapartfromthestatisticalprocedureswhichareanintegralpartofit.(Youmay wishtoconsultaseparatebookonresearchdesignIauthoredwithmycolleague JenniferMcKinney, UnderstandingandApplyingResearchDesign,2013.)
Itrytoaddresstheproblemintwoways.First,whereverpossible,Iconnectstatisticswithspecificresearchdesigns.Thisprovidesanadditionalcontextinwhichstudentscanfocusonusingstatisticstoanswerresearchquestions.Theresearchquestion drivesthedecisionaboutwhichstatisticalprocedurestouse;italsocallsfordiscussionofappropriatedesigninwhichtousethestatisticalprocedures.Wewillcover essentialinformationaboutresearchdesigninordertoshowhowthesemightbeused.
Second,Ihaveanonlinecourseinresearchdesignthatcanbeaccessedtocontinue yourexplorationfromthisbook.Inadditiontodatabasesandotherresearchresources, youcanfollowthewebaddressintheprefacetogainaccesstotheonlinecourseas additionalpreparationinresearchdesign.
FOCUSONINTERPRETATION Icallattentiontoproblemsolvingandinterpretationastheimportantelementsof statisticalanalysis.Itistemptingforstudentstofocussomuchonusingstatistical
procedurestocreatemeaningfulresults(acriticalmatter!)thattheydonotfocuson whattheresultsmeanfortheresearchquestion.Theystopaftertheyuseaformulaand decidewhetherornotafindingisstatisticallysignificant.Istronglyencouragestudentsto thinkaboutthefindingsinthecontextandwordsoftheresearchquestion. Thisisnotaneasythingtodobecausethemeaningoftheresultsisnotalwayscut anddried.Itrequiresstudentstothinkbeyondtheformula.
Statisticiansandpractitionershavedevisedrulestohelpresearcherswiththis dilemmabycreatingcriteriafordecisionmaking.Forexample,aswewillseein Chapter11,squaringacorrelationyieldsthe“coefficientofdetermination,”which representstheamountofvarianceinonevariablethatisaccountedforbytheother variable(thisisknownas“effectsize,”atopicwhichwewillspendagreatdealof timewithinthisbook).Butthenextquestionis,howmuchofthe“accountedfor variance”ismeaningful?Thisconsiderationiskeytounderstandinghowtouseand makedecisionsonthebasisofbigdata.
Inmanyways,interpretationofresultsisanartundergirdedbythecannonsof science.Muchoftheabilitytodevelopexpertiseininterpretationcomesbylong hoursoftutelagewithresearcherswhohavedoneitformanyyears.Wecannothope toemergefromourstudywiththisexpertise,butthroughconstantfocusoninterpretation,wecanbecomeawareoftheacceptablewaysofunderstandingandusing statisticalresults.
Statisticianshavesuggesteddifferentwaysofhelpingwithinterpretation.For example,whendealingwiththe“accountingofvariance”examplepresentedearlier, statisticianshavecreatedcriteriathatdetermine0.01(1%)ofthevarianceaccounted forisconsidered“small”while0.05(5%)is“medium”andsoforth.(And,muchtothe dismayofmanystudents,therearemorethanonesetofthesecriteria.)Therefore,if wedeterminethatthecorrelationbetweentwovariablesreachthesecriterialevels,we canfeelsecureinstickingtogoodinterpretationguidelines.Problemsexisthowever inhowtoviewthesestatisticalresultswithinthecontextoftheresearchproblem.
Forexample,ifaresearchquestionis,“Doesclasssizeaffectmathachievement?” andtheresultssuggestthatclasssizeaccountsfor1%ofthevarianceinmathachievement,manyresearchersmightagreetheresultsrepresentasmallandperhapseven inconsequentialimpact.However,ifaresearchquestionis,“DoesdrugXaffectEbola survivalrates?,”researchersmightconsider1%ofthevariancetobemuchmore consequentialthan“small!”ThisisnottosaythatmathachievementisanylessimportantthanEbolasurvivalrates(althoughthatisanotherofthosedebatablequestions researchersface),buttheresearchermustconsiderarangeoffactorsindeterminingmeaningfulness:theintractabilityoftheresearchproblem,thediscoveryofnew dimensionsoftheresearchfocus,whetherornotthefindingsrepresentlifeanddeath, andsoon.Thematerialpointisthatstatisticalcriteriaareimportantforestablishing meaningfulnessofresults,butoverallinterpretationinvolvesthelargercontextwithin whichtheresearchtakesplace.
Ihavefoundthatstudentshavethemostdifficulttimewiththesematters.Usinga formulatocreatenumericalresultsisoftenmuchpreferabletounderstandingwhatthe resultsmeaninthecontextoftheresearchquestion.Studentshavebeenconditioned tostopaftertheygettherightnumericalanswer.Theytypicallydonotgettothe difficultworkofwhattherightanswer means becauseitisn’talwaysapparent.
Iemphasize“practicalsignificance”(effectsize)inthisbookaswellasstatistical significance.Inmanyways,thisisamorecomprehensiveapproachtouncertainty, sinceeffectsizeisameasureof“impact”intheresearchevaluation.Itisimportant tomeasurethelikelihoodofchancefindings(statisticalsignificance),buttheextent ofinfluencerepresentedintheanalysesaffordstheresearcheranothervantagepoint todeterminetherelationshipamongtheresearchvariables.
CoverageofStatisticalProcedures Thestatisticalapplicationswewilldiscussinthisbookare“workhorses.”Thisisan introductorytreatment,soweneedtospendtimediscussingthenatureofstatisticsand basicproceduresthatallowyoutousemoresophisticatedprocedures.Wewillnotbe abletoexamineadvancedproceduresinmuchdetail.Iwillprovidesomereferences forstudentswhowishtocontinuetheirlearningintheseareas.Hopefully,asyou learnthecapabilityofSPSSandExcel,youcanexploremoreadvancedprocedures onyourown,beyondtheendofourdiscussions.
Somereadersmayhavetakenstatisticscourseworkpreviously.Ifso,myhopeis thattheyareabletoenrichwhattheypreviouslylearnedanddevelopamorenuanced understandingofhowtoaddressproblemsineducationalresearchthroughtheuseof SPSSandExcel.Whetherreadersarenewtothestudyorexperiencedpractitioners, myhopeisthatstatisticsbecomesmeaningfulasawayofexaminingproblemsand debunkingprevailingassumptionsinthesocialandhealthsciences.
Often,well-intentionedpeoplecan,throughignoranceofappropriateprocesses, promoteideasthatmaynotbetrue.Further,policiesmightbeofferedthatwouldhave anegativeimpacteventhoughthepolicywasnotbasedonsoundstatisticalanalyses. Statisticsaretoolsthatcanbemisusedandinfluencedbythevalueperspectiveofthe wielder.However,policiesareoftengeneratedintheabsenceofcompellingresearch. Studentsneedtobecome“researchliterate”inordertorecognizewhenstatistical processesshouldbeusedandwhentheyarebeingusedincorrectly.
2 DESCRIPTIVESTATISTICS:CENTRAL TENDENCY WhenIteachstatistics,Itypicallybeginbyofferingaseriesofquestionsthat emphasizetheimportanceofstatisticsforsolvingrealresearchproblems.Statistical formulasandproceduresarelogicalandcrucial,buttheprimaryfunctionfor statisticalanalyses(atleast,inmymind)istobringclarityandunderstanding toaresearchquestion.AsIdiscussedinarecentbookdealingwithstatistics forprogramevaluation(Abbott,2010),statisticalproceduresarebestusedto discoverpatternsinthedatathatarenotdirectlyobservable.Bringinglighttothese patternsallowsthestudentandtheresearchertounderstandandengageinproblem solving.
WHATISTHEWHOLETRUTH?RESEARCHAPPLICATIONS (SPURIOUSNESS) Findingthe“truth”isalaudablegoalandonethatshouldinformallresearchefforts. However,instatistics,itisnotlikelythatwewilleverreallydiscoverultimate truth.Thenatureofstatisticsisthat westrivetoobserveasfullyaspossiblewhat relationshipsexistamongvariablessothatwecanunderstandlikelycausallinkages. Doespoverty“cause”crime?Islongevityaffectedbyaccesstohealthcare?These questionsintimatevalidrelationshipsbetweentheresearchvariables.However,one ofthefirstlessonsinstatisticsandresearchisthatvalidandmeaningfulrelationships arenotalwayseasilyvisible.Certainlymostrealitiesincontemporarylifearemuch
UsingStatisticsintheSocialandHealthScienceswithSPSS® andExcel®,FirstEdition. MartinLeeAbbott. ©2017JohnWiley&Sons,Inc.Published2017byJohnWiley&Sons,Inc.
morecomplexthancanbeexplainedbytwovariables.Wethereforemustbeable to“see”patternsamongdatausingbothnumericalandvisualmeansthatunderlie seeminglysimplerelationships.
AswewilldiscussinChapter11,thereisabigdifferencebetween“correlation”and“causation.”Thisstatisticaladagehelpstopointoutthecomplexityof understandingthepatternsamongvariables.Justbecausetwovariablesarestrongly statisticallyrelateddoesnotmeanthatthereisacausalrelationshipbetweenthem. Causalityisdifficulttoprove.Inordertounderstandtheapparentcausalrelationship morefully,wemustlookat othervariablesthatmighthaveameaningfulbut“hidden” relationshipwithboth“visible”variables.Researchersusetheterm“spuriousness”to describewhetheranapparentrelationshipbetweentwovariablesmightbetheinfluenceofvariablesnotintheanalysis.Anexampleofspuriousnessistherelationship betweenicecreamconsumptionandcrime.1
Thereisapositiverelationshipbetweenratesoficecreamconsumptionandcrime; whenoneincreases,sodoestheother.Shouldweconcludethenthaticecreamconsumptionleadstocriminalbehaviorinacausalway?Spuriousnessmeansthatthere maynotatrueorgenuinerelationshipbetweenfactorsevenifitlookslikethereis. Some unobservedorunnoticedvariablemayberelatedtobothofthevariableswe can“see”(inthisexampleicecreamconsumptionandcrime),whichmaymakeit appearthatthe“visible”variableshaveacause–effectrelationship.
Inthisexample,icecreamconsumptionincreasesascrimeincreases;and, consequently,whencrimeincreases,sodoestheconsumptionoficecream.These twovariablesappeartobeconsistentlyrelatedtoeachother.Theyprobablydo nothaveacausalrelationship,however,sincebothicecreamconsumptionand crimearerelatedtoathirdfactor:temperature.Whentemperaturesrise,icecream consumptionincreases(peopleeatmoreicecreaminthesummerthanwinter).Also, whentemperaturesrise,crimeincreases.Ifweincludetheseadditionalrelationships inourstudy,thenwecanseethattheapparentcausalrelationshipbetweenicecream consumptionandcrimeisprobablyreallymoreanissueoftheweather;bothofthe variablesare“linked”bytemperature.
Withoutconsideringspuriousness,somemightbetemptedtoexplain whythereis acausalrelationshipbetweenicecreamconsumptionandcrime.Forexample,does icecreamleadtofeelingsofgrandeurorapropensityforaggression,whichcauses peopletocommitcrime?Orisitthatgoodicecreamissoexpensivethatpeople commitcrimesinordertosupporttheiricecreamhabit?Whichmakesmostsense? Althoughwecouldcomeupwithseveralreasons(mostlyfanciful)whyoneofthese variablesmightbecausallyrelatedtotheother,weneedtobecautious.
Thissituationleadstooneofthemostprofoundlessonsinsocialscience: objectivityisnecessarytopursueknowledgedispassionately.Ifweassumethereisarelationshipbetweenthingswithoutusingobjectivemeansofassessingthetruthofthe situation,thenwearesimplyimposingasubjectiveunderstandingofthesituationthat isnot“anchored”inscience.Somecallthisthe“procrusteanexercise”referencing themythologicalfigurewhoforcedpeopletoanironbedbyeitherstretchingthemto
1 ThisexampleandexplanationarediscussedinAbbottandMcKinney(2013).
Figure2.1 Thepossiblespuriousrelationshipbetweenicecreamconsumptionandcrime. fitorcuttingofftheexcess.Thus,bynottakinganobjectivestance,wemayhavea tendencytomakeapparentreality“fit”ourmentalpictureorsubjectiveassumptions.
Figure2.1showshowthepossiblerelationshipsamongicecreamconsumption, crime,andtemperature.Thetoppanelshowstheapparentrelationshipbetweenice creamconsumptionandcrime,withatwo-waylineconnectingthevariablesindicatingthatthetwoarehighlyrelatedtooneanother.Thebottompanelshowsthat, whenthethirdvariable(temperature)isintroduced,theapparentrelationshipbetween icecreamconsumptionandcrimedisappears,asindicatedbytheabsenceofaline connectingthem.
Identifyingpotentiallyspuriousrelationshipsisoftenquitedifficultandcomes onlyafterextendedresearch.Theresearchermustknowtheirdataintimatelyinorder tomakethediscovery.AnexampleofthisisastudyIconductedinastudyofindustrialdemocracyseveralyearsago.Itwasgenerallyacceptedinindustryatthetime that,ifworkersweregiventheabilitytoparticipateindecisionmaking,theywould havehigherjobsatisfaction(JS).Thiswasareasonableassumption,givensimilar findingsintheresearchliterature.However,themoreIexaminedmyowndatafrom workersinanelectronicindustry,themoreIquestionedthisassumptionanddecided toexplorethematterfurther.
Inoticedfrominterviewsthatmanyworkers didnotwanttoparticipateindecisionmaking,eventhoughtheyhadtheopportunitytodoso.Ithereforeanalyzed theoriginal“participation–jobsatisfaction”butthistimeaddedvariablesthatmeasuredworkers’attitudestowardtheirworkandadesireformanagement.Througha seriesofanalyses,Ifoundanumberofsurprisingresultsthat“modified”theoriginal assumptionofadirect(andcausal)relationshipbetweenparticipationandJS.Oneof thesefindingswasthataworker’s attitudetowardmanagementhadalottodowith theireventualsatisfactionlevels.Thoseworkerswhoparticipatedindecisionmakingandwhohadapositiveviewofmanagementshowedstrongersatisfactionthan thoseworkerswhodidnotsuchapositiveviewofmanagement.Thus,athirdvariable(viewofmanagement)thatwasnotoriginallyincludedinthesimplerelationship (participation–satisfaction)hadanimpactonthefindings.Thissubsequentanalysis discoveredapatterninthedatathatwasnot“visible”attheoutset.
Ice Cream Consumption
Ice Cream Consumption
Crime
Crime
Thepopularpressoftenpresentsresearchfindingsthataresomewhatbombastic butmightpossiblybespurious.Isstudentachievementreallyjustamatterofethnicity, orarethereotherfactorsinvolved(e.g.,familyincome)?Dolifestylechoicesdirectly impactlongevity,orarethereotherconsiderationsthatneedtobetakenintoaccount (e.g.,socialclass)?Thevalueofstatisticsisthatitequipsthestudentandresearcher withtheskillsnecessarytodebunksimplisticfindings.
DESCRIPTIVEANDINFERENTIALSTATISTICS Statistics,likeothercoursesofstudy,ismultifaceted.Itincludes“divisions”that areeachimportantinunderstandingthewhole.Twomajordivisionsaredescriptive andinferentialstatistics. Descriptivestatisticsaremethodstosummarizeand“boil down”theessenceofasetofinformationsothatitcanbeunderstoodmorereadily andfromdifferentvantagepoints.Weliveinaworldrichwithdata;descriptive statisticaltechniquesarewaysofmakingsenseofit.Usingthesestraightforward methodsallowstheresearchertodetectnumericalandvisualpatternsindatathat arenotimmediatelyapparent.
Inferentialstatisticsareadifferentmatteraltogether.Thesemethodsallowyouto makepredictionsaboutattitudes,behaviors,andpatternsonalargescalebasedon smallsetsof“sample”values.Inreallife,wearepresentedwithsituationsthatcannot provideuswithcertainty:Wouldanationaltrainingmethodimprovepatients’satisfactionratingsoftheirphysicians?Canwepredictworkers’healthscoresorlongevity inavarietyofindustriesbasedontheirjobpositions?Inferentialstatisticsallowus toinferormakeanobservationaboutanunknownvaluefromsamplevaluesthat areknown.Obviously,wecannotdothiswithabsolutecertainty–wedonotlive inatotallypredictableworld.Butwecandoitwithincertainboundsofprobability. Hopefully,statisticalprocedureswillallowustogetclosertocertaintythanwecould getwithoutthem.
THENATUREOFDATA:SCALESOFMEASUREMENT ThefirststepinunderstandingcomplexrelationshipsliketheonesIdescribedearlieristobeabletounderstandanddescribethenatureofwhatdataareavailable toaresearcher.Weoftenjumpintoaresearchanalysiswithouttrulyunderstanding thefeaturesofthedataweareusing.Understandingthedataisaveryimportant stepbecauseitcanrevealhiddenpatternsanditcansuggestcustom-madestatistical proceduresthatwillresultinthestrongestfindings.
Oneofthefirstrealizationsbyresearchersisthatdatacomeinavarietyofsizes andshapes.Thatis,researchershavetoworkwithavailableinformationtomake statisticaldecisionsandthatinformationtakesmanyforms.Studentsareidentifiedas either“qualified”or“notqualified”forfreeorreducedlunches:
1.Workerseither“desireparticipation”or“donotdesireparticipation.”
2.Jobsatisfactionismeasuredbyworkerresponsestoseveralquestionnaireitems askingthemto“AgreeStrongly,”“Agree,”“NeitherAgreenorDisagree,” “Disagree,”or“DisagreeStrongly.”
3.Medicalresearchersmeasureworkers’physicalhealthbyhowmanydaysduringthelastmonththeirphysicalhealthwasgood.
NominalData Thefirstexampleshowsthatdatacanbe“either–or”inthesensethattheyrepresent mutuallyexclusivecategories.Ifaworkerindicatesthatthey“desireparticipation” onasurveyinstrument,forexample,theywouldnotfitthe“donotdesireparticipation”category.Otherexamplesof“categorical”dataaresex(maleandfemale)and experimentalgroups(treatmentorcontrol).
Thistypeofdata,called“nominal,”doesnotrepresentacontinuum,withintermediatevalues.Eachvalueisaseparatecategoryonlyrelatedbythefacttheyare categoriesofsomelargervalue(e.g.,maleandfemalearebothvaluesofsex).These dataarecallednominalsincetherootofthewordindicates“names”ofcategories. Theyarealsoappropriatelycalled“categorical”data.
Theexamplesofnominaldatajustmentionedcanalsobeclassifiedas“dichotomous”sincetheyarenominaldatathathaveonlytwocategories.Nominaldataalso includevariableswithmorethantwocategoriessuchasschooling(e.g.,public,private,homeschooling).Wewilldiscusslaterthatdichotomousdatacancomeina varietyofformsalso,like“truedichotomies”inwhichthecategoriesnaturallyoccur likesex,and“dichotomizedvariables”thathavebeencreatedbytheresearcherfrom somedifferentkindofdata(likesatisfiedandnotsatisfiedworkers).Inallcases, nominaldatarepresentmutuallyexclusivecategories.Educatorstypicallyconfront nominaldatainclassifyingstudentsbygenderorrace,or,iftheyareconducting research,theyclassifygroupsas“treatment”and“control.”
Inordertoquantifythevariables,researchersassign numericalvaluestothecategories.Forexample,“treatmentgroups”mightbeassignedavalueof“1”and“control groups”mightbeassignedavalueof“2.”Inthesecases,thenumbersareonlycategories; theydonotrepresentactualmeasurements.Thus,acontrolgroupisnottwice atreatmentgroup.Thenumbersareonlyaconvenientwayofidentifyingthedifferentcategories.
Becausenominaldataarecategorical,wecannotusethemathematicaloperations ofaddition,subtraction,multiplication,anddivision.Itwouldmakenosensetodivide thenumberofJeepsinaparkinglot(onecategory)bythenumberofTeslasinthe sameparkinglot(secondcategory)togetasinglemeasureoftheautomobiles.In ordertogetanideaoftheautomobilesntheparkinglot,researcherswouldneedto identifythecategoriesofautomobilesandfindthepercentageofeachcategoryinthe parkinglot.Thus,wemightsaythatthereare15%Jeeps,2%Teslas,29%Toyotas, andsoonintheparkinglot.
OrdinalData Thesecondexamplelistedintheprevioussection(THENATUREOFDATA: SCALESOFMEASUREMENT)indicatesanotherkindofdata:ordinaldata.These aredatawithasecondcharacteristicofmeaning,position.Theredataarealso categories,asinnominaldata,butwiththe categoriesrelatedby“morethan”and
“lessthan.”Somecategoriesareplaced aboveinvalueorbelowinvalueofsome othercategory.
Medicalresearcherstypicallyfindordinaldatainmanyplaces:countysurveys regardingcitizens’healthandpreferencefortreatmentoptions,forexample.Inthese cases,oneperson’sresponsecanbemoreorlessthananotherperson’sonthesame measure.Accordingtoourearlierdiscussion,JScanbemeasuredbyaquestionthat workersanswerabouttheirworklikethefollowing:
“IamhappywiththeworkIdo.” 1.AgreeStrongly(SA)
2.Agree(A)
3.NeitherAgreenorDisagree(N)
4.Disagree(D)
5.DisagreeStrongly(SD)
Asyoucansee,oneworkercanbequitehappy,whichindicates“AgreeStrongly,” whileanothercanreportthattheyarealittlelesshappybyindicating“Agree.”Both workersarereportingdifferentlevelsofhappinesswithsomebeingmoreorless happythanothers.
Figure2.2showsanotherexampleofordinaldatacategories;thisexamplefromthe BRFSSCodebookinwhichmedicalresearchersassignednumberstorespondents’ reportedhealth.2
AsyoucanseeinFigure2.2,theresponsecategories(“Excellent,”“Verygood,” etc.)arestillcategories,buttheyarelinkedby“gradualamounts”ofagreement.
VariableName: GENHLTH Description: Wouldyousaythatingeneralyourhealthis:
1Excellent85,53217.3918.66
2Verygood159,10432.3531.68
3Good150,54830.6131.11
4Fair66,70013.5613.31
5Poor27,9095.684.76
7Don’tknow/notsure9690.200.18
9Refused1,0040.200.29 BLANKNotaskedormissing7
Figure2.2 TheBRFSSGENHLTHvariablevalues.
2 CentersforDiseaseControlandPrevention(CDC). BehavioralRiskFactorSurveillanceSystemSurvey Questionnaire.Atlanta,Georgia:U.S.DepartmentofHealthandHumanServices,CentersforDisease ControlandPrevention,2013.
TABLE2.1Typical OrdinalResponseScale Accordingtothedatashown,17.39%oftherespondentsreportedthattheywould ratetheirhealthwasexcellent,while5.68%ofrespondentsratedtheirhealthaspoor.
Theseexamplesofsurveydataarethestock-in-tradeofsocialscientistsbecause theyprovidesuchaconvenientwindowintopeople’sthinking.Medical,health,and socialresearchersusethemconstantlyforgaininginsightinto,andmakingdecisions about,policiesinhealthcare,urbanplanning,workerdemocracy,education,andother relatedarenas.
Thereisadifficultywiththesekindsofdatafortheresearcherhowever.Typically,theresearcherneedstoprovidea numericalreferentforaperson’sresponseto differentquestionnaireresponsecategoriesinordertoexamineanddescribetheset ofresponses.Therefore,theyassignnumberstotheresponsecategoriesasshownin Table2.1.
Thedifficultyariseswhentheresearchertreatsthenumbers(1–5inTable2.1) as integersratherthan ordinalindicators.Iftheresearcherthinksofthenumbers asintegers,theytypicallycreateanaverageratingonaspecificquestionnaire itemforagroupofrespondents.Thus,assume,forexample,thatfourpeople respondedtothequestionnaireitemabove(“IamhappywiththeworkIdo”)with thefollowingresults:2,4,3,1(i.e.,personone“Agrees,”receivinga2for“Agree”; persontwo“Disagrees,”person3is“Neutral,”andperson4“StronglyAgrees”). Thedangerisinaveragingthesebyaddingthemtogetheranddividingbyfourto get2.5asfollows(2 + 4 + 3 + 1)/4).Thisresultwouldmeanthatonaverage,allfour respondentsindicatedanagreementhalfwaybetweenthe2andthe3(andtherefore halfwaybetween“Agree”and“Neutral”).Thisassumesthateachofthenumbers hasanequaldistancebetweenthem,thatis,thatthedistancebetween4and3isthe sameasthedistancebetween1and2. ThisiswhatthescaleinTable 2.1 lookslikeif yousimplythinkofthenumbersasintegers
However,anordinalscalemakesnosuchassumptions.Ordinaldataonlyassumes thata4isgreaterthana3,ora3isgreaterthana2, butnotthatthedistancesbetween thenumbersarethesame.Table2.2showsacomparisonbetweenhowanordinal scale appearsandhowitmight actuallyberepresentedinthemindsoftwodifferent respondents.
AccordingtoTable2.2,respondent1isthesortofpersonwhoisquitecertain whentheyindicateSA.Thissameperson,however,makesfewdistinctionsbetween
TABLE2.2PerceivedDistancesinOrdinalResponseItems
AandNandbetweenDandSD(buttheyarecertainthatanydisagreementisquite adistancefromagreementorneutrality).Respondent2,bycontrast,doesn’tmake muchofadistinctionbetweenSA,A,andN,butseemstomakeafinerdistinction betweenareasofdisagreement,indicatingstrongerfeelingsabouthowmuchfurther SDisfromD.
Hopefullythisexamplehelpsyoutoseethatthenumbersonanordinalscaledo notrepresentanobjectivedistancebetweenthenumbers,buttheyareonlyindicators ofordinalcategoriesandcandifferbetweenpeopleonthesameitem.Theupshot,for research,isthatyoucannotaddthenumbersanddividebythetotaltogetanaverage becausethedistancesbetweenthenumbersmaybedifferentforeachrespondent! Creatinganaveragewouldthenbebasedondifferentmeaningsofthenumbersand wouldnotaccuratelyrepresenthowalltherespondents,asagroup,respondedto theitem.
IntervalData Themajorityoftheprocedureswewillstudyinthisbookuseintervaldata.Thesedata arenumbersthathavethepropertiesofnominalandordinaldata,butaddanother characteristic,equaldistancebetweenthenumbers.Intervaldataarenumbersthat have equaldistancebetweenthem,sothatthedifferencebetween90and91isthe sameasthedistancebetween103and104;inbothcases,thedifferenceisoneunit. Thevalueofthisassumptionisthatyoucanusemathematicaloperations(multiplication,addition,subtraction,anddivision)toanalyzethenumbersbecausethey haveequaldistances.Intervaldataarealso“continuous”sinceanintervalvariableis expressedthroughalargenumberofequaldistancemeasures.
Anexampleofanintervalscaleisastandardizedassessmenttestsuchasanintelligencequotient(IQ)test.A standardizedtestisonethatmeetsstrictcriteriafortesting andcanensurestrongvalidityandreliability.Thesetestsarebenchmarkedbyhaving beenusedwithanumberofdifferentsetsofrespondentsunderthesamedirections, withthesamematerials,time,andgeneralconditions.Theyalsotypicallyhavepublishednormssothatresearcherscanhaveanobjectivemeasureforwhichtocompare theresultsoftherespondentsoftheirownstudy.
WhilepsychologistsandeducationalresearchersdisagreeaboutwhatIQreally represents,nevertheless,thenumberssharetheequaldistanceproperty.WithIQ,or otherstandardizedtests,therespondentindicatestheiranswerstoasetofquestions designedtomeasurethecharacteristicortraitstudied.SincetheIQmeasurehasbeen usedandbenchmarkedwithsomanydifferentgroupsofpeopleoverthedecades,the scorescometohavethepropertyofequalintervalsbetweenIQquotients.
JSisanotherexample.Respondentsusuallyindicatethattheystronglyagree, agree,etc.,withaseriesofitemsmeasuringtheirattitudestowardtheirjob.TheJob DiagnosticSurvey(JDS)(HackmanandOldham,1980)includesthefollowingitem aspartofthemeasurementofJS:“IamgenerallysatisfiedwiththekindofworkIdo inthisjob”(responsescaleis“DisagreeStrongly,”“Disagree,”“DisagreeSlightly,” “Neutral,”“AgreeSlightly,”“Agree,”and“AgreeStrongly”).Themeasurementof JSusesaseriesofthesekindsofquestionstomeasureaworker’sattitudetoward theirjob.