Visit to download the full and correct content document: https://ebookmass.com/product/using-r-for-data-analysis-in-social-sciences-a-researc h-project-oriented-approach-li/
More products digital (pdf, epub, mobi) instant download maybe you interests ...
Data Analysis for the Life Sciences with R 1st Edition
https://ebookmass.com/product/data-analysis-for-the-lifesciences-with-r-1st-edition/
Numerical Methods Using Kotlin: For Data Science, Analysis, and Engineering 1st Edition Haksun Li
https://ebookmass.com/product/numerical-methods-using-kotlin-fordata-science-analysis-and-engineering-1st-edition-haksun-li-2/
Numerical Methods Using Kotlin: For Data Science, Analysis, and Engineering 1st Edition Haksun Li
https://ebookmass.com/product/numerical-methods-using-kotlin-fordata-science-analysis-and-engineering-1st-edition-haksun-li/
Cybersecurity In Humanities And Social Sciences: A Research Methods Approach 1st Edition Edition Hugo Loiseau
https://ebookmass.com/product/cybersecurity-in-humanities-andsocial-sciences-a-research-methods-approach-1st-edition-editionhugo-loiseau/
Survey Data Harmonization in the Social Sciences Irina Tomescu-Dubrow
https://ebookmass.com/product/survey-data-harmonization-in-thesocial-sciences-irina-tomescu-dubrow/
Exploratory Data Analysis Using R 1st Edition Ronald K. Pearson
https://ebookmass.com/product/exploratory-data-analysisusing-r-1st-edition-ronald-k-pearson/
Using Basic Statistics in the Behavioral and Social Sciences
https://ebookmass.com/product/using-basic-statistics-in-thebehavioral-and-social-sciences/
Practical Business Analytics Using R and Python: Solve Business Problems Using a Data-driven Approach 2nd Edition Umesh R. Hodeghatta
https://ebookmass.com/product/practical-business-analytics-usingr-and-python-solve-business-problems-using-a-data-drivenapproach-2nd-edition-umesh-r-hodeghatta/
Research Integrity: Best Practices for the Social and Behavioral Sciences Lee Jussim
https://ebookmass.com/product/research-integrity-best-practicesfor-the-social-and-behavioral-sciences-lee-jussim/
UsingRforDataAnalysis inSocialSciences UsingRforData Analysisin SocialSciences AResearchProject-OrientedApproach
QUANLI
OxfordUniversityPressisadepartmentoftheUniversityofOxford.Itfurthers theUniversity’sobjectiveofexcellenceinresearch,scholarship,andeducation bypublishingworldwide.OxfordisaregisteredtrademarkofOxfordUniversity PressintheUKandincertainothercountries.
PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica.
©OxfordUniversityPress2018
Allrightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,withoutthe priorpermissioninwritingofOxfordUniversityPress,orasexpresslypermitted bylaw,bylicenseorundertermsagreedwiththeappropriatereproduction rightsorganization.Inquiriesconcerningreproductionoutsidethescopeofthe aboveshouldbesenttotheRightsDepartment,OxfordUniversityPress,atthe addressabove.
Youmustnotcirculatethisworkinanyotherform andyoumustimposethissameconditiononanyacquirer.
LibraryofCongressCataloging-in-PublicationData Names:Li,Quan,1966–author.
Title:UsingRfordataanalysisinsocialsciences:aresearch project-orientedapproach/QuanLi. Description:NewYork,NY:OxfordUniversityPress,[2018] Identifiers:LCCN2017010031|ISBN9780190656225(pbk.)| ISBN9780190656218(hardcover)|ISBN9780190656232(updf)| ISBN9780190656249(epub)Subjects:LCSH:Socialsciences–Research–Data processing.|Socialsciences–Statisticalmethods.|R(Computerprogramlanguage) Classification:LCCH61.3.L522018|DDC330.285/5133–dc23 LCrecordavailableathttps://lccn.loc.gov/2017010031
135798642
PaperbackprintedbyWebCom,Inc.,Canada HardbackprintedbyBridgeportNationalBindery,Inc.,UnitedStatesofAmerica
CONTENTS ListofFigures ix
ListofTables xi
Acknowledgments xiii
Introduction xv
1.LearnaboutRandWriteFirstToyPrograms 1
WHENTOUSERINARESEARCHPROJECT 2
ESSENTIALSABOUTR 3
HOWTOSTARTAPROJECTFOLDERANDWRITEOURFIRSTRPROGRAM 4
CREATE,DESCRIBE,ANDGRAPHAVECTOR:ASIMPLETOYEXAMPLE 7
SIMPLEREAL-WORLDEXAMPLE:DATAFROMIVERSENANDSOSKICE(2006) 23
CHAPTER1:RPROGRAMCODE 28
TROUBLESHOOTANDGETHELP 32
IMPORTANTREFERENCEINFORMATION:SYMBOLS,OPERATORS,ANDFUNCTIONS 34
SUMMARY 35
MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 36
EXERCISES 42
2.GetDataReady:Import,Inspect,andPrepareData 43
PREPARATION 43
IMPORTPENNWORLDTABLE7.0DATASET 45
INSPECTIMPORTEDDATA 49
PREPAREDATAI:VARIABLETYPESANDINDEXING 55
PREPAREDATAII:MANAGEDATASETS 59
PREPAREDATAIII:MANAGEOBSERVATIONS 65
PREPAREDATAIV:MANAGEVARIABLES 68
CHAPTER2PROGRAMCODE 78
SUMMARY 85
MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 86 EXERCISES 93
3.One-SampleandDifference-of-MeansTests 94
CONCEPTUALPREPARATION 95
DATAPREPARATION 101
WHATISTHEAVERAGEECONOMICGROWTHRATEINTHEWORLDECONOMY? 104
DIDTHEWORLDECONOMYGROWMOREQUICKLYIN1990THANIN1960? 115
CHAPTER3PROGRAMCODE 128
SUMMARY 133
MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 133 EXERCISES 142
4.CovarianceandCorrelation 143
DATAANDSOFTWAREPREPARATIONS 143
VISUALIZETHERELATIONSHIPBETWEENTRADEANDGROWTHUSING SCATTERPLOT 146
ARETRADEOPENNESSANDECONOMICGROWTHCORRELATED? 149
DOESTHECORRELATIONBETWEENTRADEANDGROWTHCHANGEOVERTIME? 154
CHAPTER4PROGRAMCODE 160
SUMMARY 163
MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 164 EXERCISES 168
5.RegressionAnalysis 170
CONCEPTUALPREPARATION:HOWTOUNDERSTANDREGRESSIONANALYSIS 171
DATAPREPARATION 175
VISUALIZEANDINSPECTDATA 182
HOWTOESTIMATEANDINTERPRETOLSMODELCOEFFICIENTS 185
HOWTOESTIMATESTANDARDERROROFCOEFFICIENT 187
HOWTOMAKEANINFERENCEABOUTTHEPOPULATIONPARAMETER OFINTEREST 188
HOWTOINTERPRETOVERALLMODELFIT 190
HOWTOPRESENTSTATISTICALRESULTS 193
CHAPTER5PROGRAMCODE 194
SUMMARY 198
MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 199 EXERCISES 204
6.RegressionDiagnosticsandSensitivityAnalysis 206
WHYAREOLSASSUMPTIONSANDDIAGNOSTICSIMPORTANT? 206
DATAPREPARATION 211
LINEARITYANDMODELSPECIFICATION 215
PERFECTANDHIGHMULTICOLLINEARITY 221
CONSTANTERRORVARIANCE 223
INDEPENDENCEOFERRORTERMOBSERVATIONS 227
INFLUENTIALOBSERVATIONS 240
NORMALITYTEST 245
REPORTFINDINGS 247
CHAPTER6PROGRAMCODE 251
SUMMARY 259
MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 259 EXERCISES 262
7.ReplicationofFindingsinPublishedAnalyses 263
WHATEXPLAINSTHEGEOGRAPHICSPREADOFMILITARIZEDINTERSTATEDISPUTES?
REPLICATIONANDDIAGNOSTICSOFBRAITHWAITE(2006) 264
DOESRELIGIOSITYINFLUENCEINDIVIDUALATTITUDESTOWARDINNOVATION?
REPLICATIONOFBÉNABOUETAL.(2015) 284
CHAPTER7PROGRAMCODE 295
SUMMARY 301
8.Appendix:ABriefIntroductiontoAnalyzingCategorical DataandFindingMoreData 302
OBJECTIVE 302
GETTINGDATAREADY 303
DOMENANDWOMENDIFFERINSELF-REPORTEDHAPPINESS? 304
DOBELIEVERSINGODANDNON-BELIEVERSDIFFERINSELF-REPORTED HAPPINESS? 310
SOURCESOFSELF-REPORTEDHAPPINESS:LOGISTICREGRESSION 313 WHERETOFINDMOREDATA 323
ReferencesandReadings 327 Index 331
LISTOFFIGURES 1.1HowtoWriteFirstToyPrograminR 8
1.2HowtoInstallAdd-onPackage 18
1.3DistributionofDiscreteVariablevd$v1:BarChart 21
1.4DistributionofContinuousVariablevd$v1:Boxplotand Histogram 23
1.5DistributionofWageInequalityfromIversenand Soskice(2006) 27
1.6DistributionofPRandMajoritarianSystemsfromIversenand Soskice(2006) 27
1.7RStudioScreenshot 38
2.1UsingView()FunctiontoViewRawData 50
2.2DistributionofVariablergdpl 55
3.1TypesofErrorsandAlternativeSamplingDistributions 100
3.2HistogramforGrowth 113
3.3Meanand95%ConfidenceIntervalforGrowth 114
3.4Meanand95%ConfidenceIntervalforGrowth:1960and1990 127
4.1SimulatedPositiveCorrelationsofTwoRandomVariables 147
4.2ScatterPlotofTradeOpennessandEconomicGrowth 148
4.3CorrelationbetweenTradeandGrowthoverTime 157
4.4 P ValueofCorrelationbetweenTradeandGrowthoverTime 159
4.5AnscombeQuartetScatterPlot 166
5.1OriginalStatisticalResultsfromFrankelandRomer(1999) 174
5.2ComparingUnloggedandLoggedIncomeperPerson 184
5.3TradeOpennessandLogofIncomeperPerson 184
5.4CoefficientsPlotforModel1 194
5.5PartialRegressionPlot 203
5.6ExplorePairwiseRelationshipsamongVariables 204
6.1AnscombeQuartetRegressions 210
6.2AnscombeQuartetResidualsversusFittedValuesPlots 211
6.3DiagnosticPlotsforaWell-BehavedRegression 212
6.4ResidualsversusFittedValues:Linearity 216
6.5ResidualsversusIndependentVariables:Linearity 217
6.6TradeOpennessandLogofIncomeperPerson 220
6.7DistributionofResidualsbyRegion 228
6.8ScatterPlotofTradeandIncomebyRegion 230
6.9EstimatedEffectofTradeonIncomebyRegion 237
6.10InfluencePlotofInfluentialObservations 241
6.11InfluentialObservationsAboveCook’sDThreshold 243
6.12NormalityAssumptionDiagnosticPlot 245
7.1RegressionDiagnosticPlot:ResidualsversusFittedValues 274
7.2DiagnosticPlotforInfluentialObservations:Cook’sD 278
7.3NormalityAssumptionDiagnosticPlot 281
8.1SamplePagefromWorldValuesSurveyCodebook 303
LISTOFTABLES 1.1CountryMeansforVariablesUsedinRegressionAnalysis (fromIversonandSoskice,2006) 24
1.2StatisticsofImportedDatafromIversenandSoskice(2006) 26
1.3ImportantSymbolsinR 34
1.4ArithmeticOperators 35
1.5LogicalOperators 35
1.6CommonStatisticalandMathematicalFunctions 36
2.1ListofDataPreparationTasksandRelatedRFunctions 46
3.1LogicofStatisticalInference 96
3.2Two-SampleDifference-of-MeansTests 123
5.1CoefficientInterpretationinLogorUnloggedModels 175
5.2DescriptiveStatisticsofFinalDataset 183
5.3EffectofTradeOpennessonRealIncomeperPerson 193
6.1RegressionResultsUsingAnscombe’sQuartet 209
6.2EffectofTradeonIncome:RobustnessChecksPartI 249
6.3EffectofTradeonIncome:RobustnessChecksPartII 250
7.1VariableMeasuresandExpectedEffects 266
7.2OLSRegressionofDisputeDispersion(OriginalStatisticalResults TablefromBraithwaite,2006) 267
7.3OriginalDescriptiveStatisticsTableinBraithwaite(2006) 269
7.4CausesofSpreadofMilitaryDisputes:ReplicationandRobustness Tests 282
7.5MostImportantQualitiesforChildrentoHave(fromBénabouetal., 2015) 285
7.6VariableLabelsforDatasetinBénabouetal.(2015) 288
7.7ReplicatingTable2inBénaboutetal.(2015) 293
ACKNOWLEDGMENTS Fiveoriginaltablesfromfourdifferentjournalarticlesarereprintedinthebook forreplicationexercises.Thearticlesinclude(1)Iversen,Torben,andDavid Soskice,2006,“ElectoralInstitutionsandthePoliticsofCoalitions:WhySome DemocraciesRedistributeMoreThanOthers,”AmericanPoliticalScienceReview 100(2):165–81,TableA1.Copyright:CambridgeUniversityPress.(2)Frankel, JeffreyA.,andDavidRomer,1999.“DoesTradeCauseGrowth?”American EconomicReview89(3):379–99,Table3.Copyright:AmericanEconomicAssociation.(3)Braithwaite,Alex.2006.“TheGeographicSpreadofMilitarizedDisputes,”JournalofPeaceResearch43(5):507–22,TableIandTableII.Copyright: SAGEPublications.(4)Bénabou,Roland,DavideTicchi,andAndreaVindigni, 2015,“Religionand‘Innovation”’AmericanEconomicReview105(5):346–51, Table2.Copyright:AmericanEconomicAssociation.Permissionstoreprintthe relevanttablesinIversenandSoskice(2006)andBraithwaite(2006)havebeen acquiredandlicensedfromCambridgeUniversityPressandSAGEPublications.
JeffreyFrankel,RolandBénabou,andAmericanEconomicAssociationdeserve specialthanksforgraciouslygrantingmepermissiontoreprinttherelevant tablesintheirarticlesforfree.
Figures1through4inF.J.Anscombe’s“GraphsinStatisticalAnalysis,” publishedin1973in TheAmericanStatistician 27(1):17–21,havebeenadapted andusedwithpermissionofthepublisher,Taylor&FrancisLtdhttp://www. tandfonline.com.
Thisbookwouldnothavebeenpossiblewithouttheencouragement,help, andsupportofmanystudents,colleagues,andfriends.Myundergraduate studentsinPolimetricsandSeniorResearchSeminaratTexasA&MUniversity gavemethefirstimpetustowritethisbook.Manystudentstakingthosetwo courses,especiallyJacobKingandAlexGoodman,caughttyposandmistakesin earlierdrafts.Duringthesummerof2016,ScarletAmo,CorbinCali,Chandler Dawson,andElizabethGohmertexperimentedwithusinganearlierversionof themanuscripttoself-studyRfordataanalysis.Theyprovideddetailedreports
acknowledgments oneachchapterandcompletedindependentapplicationpapers.Theirinputhas dramaticallychangedandimprovedhowvariousmaterialsinthebookarenow presentedandstructured.Ithankthemfortheirextraordinaryworkandeffort. Mygraduateassistants,MollyBerkemeier,KellyMcCaskey,andAustinJohnson, providedexcellenteditorialassistance.Mycolleaguesandfriends,TiyiFeng,Ren Mu,EricaOwen,andCarlisleRainey,readpartsofanearlierdraftandprovided valuablefeedbackandsuggestions.
ManypeopleatOxfordUniversityPresshavehelpedtomakethismanuscript possibleandbetter.ScottParris,whowastheeditorformyfirstbookby CambridgeUniversityPress,hadbeenpatientlyencouragingandproddingme tofinishthisbookuntilhisretirementfromOxford.Happyretirement,Scott! BeforeretiringfromOxford,ScotthandedmycasetoAnneDellinger.Anne’s enthusiasmandencouragementwerethemainreasonthatIdecidedtostaywith Oxford.AfterAnnedepartedfromOxford,DavidPervinbecamemyeditorand offeredsoundadvice.Scott’sassistantCathrynVaulmanandDavid’sassistants EmilyMackenzieandHayleySingertookcareofmanyofthelogisticissues intheprocess.DebbieRuelcorrectedmanyerrorsanddidagreatjobduring copyediting,andLincyPriyapatientlydealtwithmyrequestsandsmoothly handledtheproductionofmybook.XunPangandJudeHaysprovidedvaluable commentsandsuggestionsthathelpedtomakethebookenormouslybetter.
Finally,mygreatestdebtofgratitudeisowedtomywife,Liu,andmytwo children,EllenandAndrew.Withouttheirunyieldingsupport,constantinquiry, andevenreadingpartsofthebookandcheckingmyRcode,Iwouldnothave finishedtheproject.Thisbookisdedicatedtothem!
INTRODUCTION Thisbookseekstoteachseniorundergraduateandbeginninggraduatestudents insocialscienceshowtouseRtomanage,visualize,andanalyzedatain ordertoanswersubstantiveresearchquestionsandreproducethestatistical analysisinpublishedjournalarticles.Overthepastseveraldecades,statistical analysistraininghasbecomeincreasinglyimportantforundergraduateand graduatestudentsinmanydisciplineswithinsocialandbehavioralsciences,such aseconomics,politicalscience,publicadministration,business,publichealth, anthropology,psychology,sociology,education,andcommunication.Withrapid progressinstatisticalcomputing,proficiencyinusingstatisticalsoftwarehas becomealmostauniversalrequirement,albeittovaryingdegrees,instatistical methodscourses.Popularsoftwarechoicesinclude:SAS,SPSS,Stata,andR. WhileSAS,SPSS,andStataallhaveaccessibleintroductorytextbookstargeting studentsinsocialsciences,suchtextbooksonRarerare.
ComparedwithcommercialpackageslikeSAS,SPSS,andStata,Rhasat leastthreestrengths.Itisawell-thought-out,coherentsystemthatcomes withasuiteofsoftwarefacilitiesfordatamanagement,visualization,and analysis.Inaddition,tomeetemergingneeds,alargecommunityofRusers constantlydevelopsnewopensourceadd-onpackages,alreadyreachingover 10,000.Finally,perhapsthegreatestperkofthesoftwareisthatitisfree.This financialbenefitcannotbeover-emphasized.Cash-strappedcollegestudents oftenfindthemselvesrelyingonlabcomputersforaccesstoSAS,SPSS,and Stata,orconstrainedbythelimitationsofthestudentversionsofthose commercialpackages.Evenpostgraduation,manyfinditdifficulttoconvince theiremployerstopurchaseaparticularcommercialpackagetheyknowfortheir everydayuse.
TherearemanyreasonswhyRispreferredtootherstatisticalsoftware packagesinhighereducation.ButR’sgreatesthandicaptoitswidespreaduse inthesocialsciencesisitssteeplearningcurve.Whilethemarkethasproduced numerousbooksonRatvariouslevels,introductorytextbooksthatfocusonthe
needsofstudentsinthesocialsciencesarenoteasytofind.Thisbookseeksto fillthisvoid.
ThisbookdistinguishesitselffromotherintroductoryRorstatisticsbooksin threeimportantways.First,itintendstoserveasanintroductorytextonusing Rfordataanalysisprojects,targetinganaudiencerarelyexposedtostatistical programming.Therationaleforemphasizingtheintroductorynatureofthis bookissimple;itisdrivenbytheneedsandheterogeneityofthestudentbodywe oftencomeacrossinclassroomteachinginsocialsciencedepartments.Unlike studentsinmathandstatistics,manystudentusersofRinsocialsciences havenoexperienceinanycomputinglanguageorprogrammingsoftware,and manywillneverachieveahigherlevelofprogrammingbeyondwhatisnecessary fortheireverydayuseinR.However,studentsinsocialscienceswillfindthat theopportunitytouseRfordatamanipulation,visualization,andanalysis frequentlypresentsitselfinvariouscoursesandfuturecareers.Hence,they needtobecomeproficientataccomplishingcommontasksindatamanipulation, visualization,andanalysisusingR,withoutgettingoverlytechnical.Inthis respect,existingintroductorytextsonRprogrammingthatdonotinvolve statisticstendtobeoverlycomprehensiveincoverageandareoftengeared towardstudentsinmath,statistics,sciences,andengineering,thusintimidating mostsocialsciencestudents.AlainZuur,ElenaIeno,andErikMeesters’ A Beginner’sGuidetoR andPhilipSpector’s DataManipulationwithR aregood examples.Theirtargetaudiencesoftenarestudentsinmath,statistics,sciences, andengineeringmajorswhohavemoreexperiencesinprogrammingthanfellow classmatesinsocialsciences.
Thisbook,incontrast,adoptsaminimalistapproachinteachingR.Itcovers onlythemostimportantfeaturesandfunctionsinRthatonewillneedforconductingreproducibleresearchprojects,withothermaterialsmovedtochapter appendicesorremovedfromconsiderationcompletely.Risextremelyflexible, almostalwaysallowingmultiplesolutionstooneprogrammingtask.Whilethis isastrength,itdoeschallengebeginningRusersrarelyexposedtocomputer programming.Theminimalistapproachadoptedherewillpresenttypicallyone waytodealwithataskinthemainpartofachapter,leavingotherstuffto asectioncalled“MiscellaneousQuestionsforAmbitiousReaders.”Asaresult, theminimalistapproachshouldflattenthesteeplearningcurve—acommonly noteddisadvantageofR—therebyimprovingthesoftware’saccessibilityto undergraduatesandsimilaraudiences.Organizationally,thisbookbreaksdown chaptersintosmallsectionsthatmimiclabsessionsforstudents.Eachchapter focusesononlytheessentialRfunctionsoneneedstoknowinorderto manipulate,visualize,andanalyzedatatoaccomplishsomeprimarystatistical analysistasks.Intheend,throughthisminimalistapproach,thereaderwill accumulateenoughRknowledgeandskillstocompleteacourseresearchproject andtoself-studymoreadvancedRmaterialsifnecessary.
Aseconduniquefeatureofthisbookisitsemphasisonmeetingthepractical needsofstudentsusingRtoconductstatisticalanalysisforresearchprojects drivenbysubstantivequestionsinsocialsciences.Inadditiontohomework assignmentsandproblemsets,statisticalmethodscoursesinsocialsciences oftenrequirethecompletionofafull-blown,substantivelymotivatedresearch project.Suchtrainingiscriticalifstatisticalknowledgeistoprovetobeofany valueandrelevancetosubstantivecoursesandstudents’futurecareers.Ideally, studentscanutilizecompletedstatisticalanalysispapersaswritingsamplesto showcasetheirquantitativeskillsintheirgraduateschoolorjobapplications.
Inpractice,toaccomplishsuchaprojectonasubstantivequestion,astudent hastocollect,clean,andmanipulatedata,visualizeandanalyzedatasystematicallytoaddressthequestionasked,andreportfindingsinanorganizedmanner. ManyRbooksforintroductorystatisticstendtoemphasizetheRcodesfor statisticaltechniques,givinginsufficientattentiontothepre-analysisneedsof usersaswellastheprocessofcompletingaresearchproject.Forexample,John Verzani’s UsingRforIntroductoryStatistics andMichaelCrawley’s Introductory StatisticsUsingR aretwopopulartextsinthiscategory.Datapreparationisnot linkedtoparticularresearchprojectsthataddresssubstantivequestions.
Incontrast,thisbookiswrittenunderthepremisethatthereaderuses Rprimarilytoaddresssomesubstantivequestionofinterest.Thisleadsto severalnotabledifferencesfromotherintroductorystatisticsbooksusingR.This bookbeginswiththeuseofRtogetanoriginalrawdatasetintoacondition appropriateforstatisticalanalysis,thusemphasizinghowtodealwithvarious issuesthatariseinsuchaprocess.Next,insteadofstartingwiththeinteractive useofR,whichistypicalinothertextbooks,thisbookgivesexclusiveattention towritingandexecutingRprograms.Thisapproachallowseasyverification, recollection,andreplicationofanalysis,anditisalmostalwayshowthings aredoneinactualreproducibleresearch.Studentsfollowingthisapproachwill writemanywell-documentedRcodesthataddressavarietyofpracticalissues suchthattheycansavethoseprogramsforfuturereference.Lastbutnotleast, theuseofRinthisbookiscloselyintegratedintoaprototypicalprocessthat consistsofasequenceofelements:asubstantivequestiontobeanswered,a hypothesisthatanswersthequestion,thelogicofstatisticalinferencebehind theempiricaltestofthehypothesis,theteststatisticforstatisticalinference representedinmathematicalnotationandimplementedcomputationallyinR, andthepresentationoffindingsinanorganizedmanner.Theemphasisison anin-depthunderstandingofwhywedostatisticalanalysisandhowRfits intoactualempiricalresearch.Hence,thisresearchprocess-orproject-oriented approachoughttosignificantlyincreasethelikelihoodthatstudentswillactually useRtosolveproblemsintheirfuturecoursesandcareers.
Athirduniquefeatureofthisbookisitsemphasisonteachingstudents howtoreplicatestatisticalanalysesinpublishedjournalarticles.Scientific
progressrequirespreviousfindingsbereplicableandreplicated;scientificeducation,likeinphysicsandchemistry,alwaysincludeslabexercisesthatreplicatepreviousexperiments.Associalscientificknowledgebecomesincreasingly evidence-basedandreliesonextensivedataanalysis,learningtoreplicate publishedresultsisanecessarystepforundergraduatesandfirst-yeargraduate studentsintheirlearningtoconductsocialscientificresearch.Suchtraining nowbecomesfeasiblebecauseoftheavailabilityofpowerfulfreesoftwareanda widerangeofdatasetsinthepublicdomain.Manyjournalsnowrequireauthors tosubmitanddepositreplicationdatasets.Manyoriginaldatafromsurveys andarchivalresearcharedownloadablefromtheinternet.Studentsnolonger havetobejustpassiveconsumersofsocialscientificresearchbutinsteadcan activelyscrutinizepublishedresearch,playwiththedata,andreproduceorfailto reproducepreviousfindings.Thiswillconvertstudentsfrompassiveconsumers intoactivelearners.Asreproducingresearchfindingsbecomesthenormrather thantheexception,itwillempowerthestudents,lowerthebarriertotheirentry intotheacademiccommunity,andchallengetheprofessorsandotherknowledge producers.Thewidegapbetweenteachingandresearchcommonlyobserved inundergraduatecoursesinsocialscienceswillbenarrowed.Suchchangesare likelytomaketeachingmoreinterestingforprofessors,renderlearningmore fruitfulforstudents,andenablebothpartiestobecomemoresuccessfulintheir endeavors.
Thisbookconsistsofeightchapters.Chapter1introducesR,illustrating howtowriteandexecuteprogramsusingthesoftware.Chapter2goesthrough theprocessof,andvariousmaintasksin,gettingdatareadyforanalysisinR. Chapter3providesaconceptualbackgroundonthelogicofstatisticalinference andthendemonstrateshowtomakestatisticalinferencewithrespecttoone continuousoutcomevariableusingone-andtwo-samplettests.Chapter4moves intoanalyzingtherelationshipbetweentwocontinuousvariables,focusingon covarianceandcorrelation.Chapter5introducesregressionanalysis,covering itsconceptualfoundation,modelspecification,estimation,interpretation,and inference.Chapter6continueswithregressionanalysis,delvingintovarious diagnosticsandsensitivityanalyses.Chapters4through6followthesame approach,integratingconceptualandmathematicalfoundation,datapreparation,statisticalanalysis,andresultsreportingwithineachchapter.Chapter 7walksreadersthroughtheprocessofreplicatingtwopublishedanalyses. Finally,Chapter8,asanappendix,providesabriefintroductiontoanalyzing discretedata,demonstratingtheChi-squaredtestofindependenceandlogistic regression.
Notextbookcanbeperfect;thisoneisnoexception.Theminimalistapproach, emphasizingtheaccessibilityofR,comesataprice.Manycommonlyused functionsandfeaturesofR,suchaswritingfunctionsandloops,arenot covered.Similarly,byfocusingonteachingtheresearchprocessofhowtouse
Rtoaddresssubstantivequestions,thisbookcoversprimarilyexplainingone continuousoutcomevariableandrelevantstatisticaltechniques,suchasmean, differenceofmeans,covariance,correlation,andcross-sectionalregression. Hence,comprehensivenessinbothprogrammingandstatisticsissacrificed,on purpose,forgreateraccessibility,clarity,anddepth.Thegoalistomakethisbook accessibleandusefulfornovicesinbothprogramminganddataanalysis.
Insum,thisbookintegratesRprogramming,thelogicandstepsofstatistical inference,andtheprocessofempiricalsocialscientificresearchinahighly accessibleandstructuredfashion.ItemphasizeslearningtouseRforessential datamanagement,visualization,analysis,andreplicatingpublishedresearch findings.Bytheendofthisbook,studentswillhavelearnedhowtodothe following:(1)useRtoimportdata,inspectdata,identifydatasetattributes, andmanageobservations,variables,anddatasets;(2)useRtographsimple histograms,boxplots,scatterplots,andresearchfindings;(3)useRtosummarizedata,conductone-samplet-test,testthedifference-of-meansbetween groups,computecovarianceandcorrelation,estimateandinterpretordinary leastsquare(OLS)regression,anddiagnoseandcorrectregressionassumption violations;and(4)replicateresearchfindingsinpublishedjournalarticles. The principlebehindthisbookistoteachstudentstolearnaslittleRaspossiblebutto doasmuchsubstantivelydrivendataanalysisatthebeginnerorintermediatelevel aspossible. Theminimalistapproachshoulddramaticallyreducethelearning costbutstillproveadequateformeetingthepracticalresearchneedsofsenior undergraduateandbeginninggraduatestudentsinthesocialsciences.Having completedthisbook,studentscancompetentlyuseRandstatisticalanalysisto answersubstantivequestionsregardingsomesubstantivelyinterestingcontinuousoutcomevariableinacross-sectionaldesign.Itismyhopethat,thenewly acquiredcompetencewillmotivatestudentstowantto,ratherthanbeingforced to,learnmoreaboutRandstatistics.
UsingRforDataAnalysis inSocialSciences LearnaboutRandWrite FirstToyPrograms ChapterObjectives Inthisfirstchapter,wewillaimtoachievethefollowingobjectives:
1.UnderstandwhentouseRinaresearchproject.
2.LearnaboutthebasicbackgroundofR,softwareinstallation,andgetting help.
3.LearntosetupaprojectfolderforRprogramsanddatafiles.
4.Learntowriteandexecutesimpletoyprograms.
5.LearntofindandsettheworkingdirectoryforaprojectinR.
6.Learntocreateadatavector.
7.Learntocalculatedescriptivestatisticsandhandlemissingvalues.
8.Learntoconvertadatavectorintoadataframe.
9.Learntorefertoavariablewithinadataframe.
10.Learntoinstallanadd-onpackage,"stargazer,"loaditintoR,anduseitto getadescriptivestatisticstable.
11.Learntographthedistributionofavariable.
12.Applyallthelessonslearnedtoareal-worlddataexample.
13.Learnaboutcommoncodingerrorsandhowtogethelp.
Materialsinthischapterneedaboutanhourandahalfforaclassofabout 10studentstocoverinalab,includingbrieflecturingandhands-onpractice. Largerclassesorself-studycouldtakelonger.
WhentoUseRinaResearchProject Tocompleteanempiricalresearchprojectinvolvesseveralstages,oftenstarting withtheidentificationofaresearchproblemandendingwiththereportof findingsandimplications:
1.Identifyaresearchproblem
2.Surveytheliterature(Findoutwhatisknownabouttheproblem)
3.Formulateatheoreticalargumentandsometestablehypothesis
4.Measureconcepts
5.Collectdata
6.Preparedata
7.Analyzedata
8.Reportfindingsandimplications
Thetasksofidentifyingasignificantandinterestingresearchproblem, surveyingtheextantliterature,formulatingacoherenttheoreticalargumentand sometestablehypothesisthatexplaintheresearchpuzzle,measuringconcepts inthetheoryempirically,andcollectingdatafortheempiricalindicatorsofthe concepts—tasks(1)to(5)—aregenerallydealtwithinsubstantiveandresearch designcoursesinafield.ThosetopicsarebeyondthescopeofthislittleRbook. Yettasks(6)to(8)mayallinvolveRasaresearchinstrument.Specifically,using Rforactualresearchprojectsistoanalyzeparticularresearchproblems,such asevaluatingtheimpactofapolicyortestingtheimpactofacausalfactor(or anindependentvariable)onanoutcome(oradependentvariable)ofinterest, aspostulatedbypre-specifiedtheoreticalexpectations.Howtoaccomplishtasks (6)to(8)willbeillustratedinthefollowingchapters.
Aresearchprojectofthistypepresentsatleasttwochallenges,forwhichR willbeuseful.First,inpractice,suchaprojectinvolvesarangeoftasks,such asimportingdataintosoftware,mergingdifferentdatasetstogether,verifying data,creatingnewvariables,recodingandrenamingvariables,visualizingdata, runningstatisticalestimationprocedures,carryingoutdiagnostictests,andso on.Second,ananalystneedstobeabletoreproducehisorherownanalysis, includingdatasetconstructionandestimationresults,evenyearslater.Thefirst challengeconcernstheefficiencyofananalysis,whereasthesecondconcernsthe reproducibilityandintegrityoftheanalysis.
Toachievebothefficiencyandreproducibility,experiencedanalystsalways choosetowritedowntheircomputingcodeinoneormoreprogramssothat thecodecanbesubmitted,revised,andresubmittedtoreproduceananalysis speedilyandwhenevernecessary.Hence,inthisbook,wewillfocusonhowto writeandsubmitRprogramsforspecifictasksinaprogrameditor,ratherthan theinteractiveuseormenu-driveninterfaceofR.Forallpracticalpurposes,
theprogrammingapproachismuchmoreefficientandconsistentthanthe interactiveormenu-drivenapproach.
BeforewestepintohowtouseR,wewillneedtoclarifysomerelated organizationalandhousekeepingissues.Inthischapter,wewillfirstoffera verybriefintroductiontoR,thendemonstratehowtoinstallR,writeand executeRprograms,installandloadadd-onpackages,andproducegraphical andnumericaloutput,andthenturntoessentialreferenceinformationabout importantsymbolsandcommoncodingerrors.Notably,eachlineofRcodewill likelyappearthreetimes:presentedasastand-alonecommandlineprecededor followedbyanexplanationofitspurposeandfunction,listedtogetherwiththe outputfromitsexecution,andcollatedwithallotherprogramcodeinthechapter forthesakeofconvenientreference.Wewillendthechapterwithasectionabout miscellaneousissuesofinteresttoambitiousreadersandasectiononexercises.
EssentialsaboutR AOne-ParagraphIntroductiontoR Risacomputerlanguageandanenvironmentforstatisticalcomputingand graphicswithimportantadvantages.StartedbyRobertGentlemanandRoss IhakaoftheUniversityofAucklandin1995,itisnowmaintainedbytheR core-developmentteamofvolunteerdevelopers.Risreferredtoasacomputer languagebecauseasadialectoftheSlanguagedevelopedinthelate1980s atAT&T’slabs,Rallowsuserstofollowthealgorithms,defineandaddnew functions,andwritenewanalyticmethods,ratherthanmerelysupplyingcanned routines.Risalsoacoherentsystemwhichprovidesanenvironmentwithan integratedsuiteofsoftwarefacilitiesfordatastorage,manipulation,analysis, andvisualization.Inaddition,Risflexible.ItrunsonWindows,UNIX,andMac OSX.Itcanbeeasilyextendedintermsofnewfunctionsandstate-of-the-art statisticalmethods;theover10,000add-onpackagesbytheendofJanuary 2017throughtheCRANfamilyofinternetsitestestifytothisfact.Lastbutnot least,Risfree,asareitsnumerousadd-onpackages.Hence,Rispopularamong practitionersinmanyfieldsandscholarsinmanydisciplines,includingthesocial sciences.
Installation Asanopensourcesoftwareforstatisticalcomputing,Rcanbeeasilydownloaded fromthefollowingsite:http://www.r-project.org/.Wemaysimplyclickonthe highlighted downloadR linktoreachalistofCRANmirrorsites.Clickingon anysitewepreferdirectsustothepagefordownloadingthesoftwareforthree differentplatforms:Linux,Windows,andMac.Rworksslightlydifferentlyacross
thethreeplatforms.Forthepurposeofthisbook,wewillfocusonthelanguage andfunctionalityspecifictotheWindowsplatform.Macusersmayconsultthe MiscellaneousQ&ASectionlaterinthechapterforsomebriefexplanation.
Risbeingconstantlyupdatedtonewversionsbydevelopers.Itisworth notingthatsomeRprogramsandpackagesusedinthisbookcouldrequire3.3.2 ornewer.IftheversionofRonamachineisnotuptodate,onemaysimply uninstalltheoldversionandinstallthelatestversionfollowingtheprocedures describedpreviously,orrefertothesubsectiononhowtoupdateRinthe MiscellaneousQ&ASection.
HowtoStartAProjectFolderandWriteOurFirstRProgram LearntoSetupAProjectFolderforProgramsandDataFiles
Thefirststepinaprojectistosetupaprojectfoldertoholdrelevantdatasets, programs,andoutputfiles.Wecanthinkofaprojectfolderasourhomemailing address,andalltherelevantdatasets,programs,andoutputfilesasthemailand packagestobedeliveredtous.Withoutthemailingaddress,thepackagesand mailwillnotbedeliveredtotherightplace.Hence,aprojectfolderallowsus toeasilyfindalltherelevantfilesandavoidhavingthemmingledandconflated withthosefilesforotherprojectsorpurposes.
InWindows,wecancreateaprojectfolderviathefollowingsteps:Open MyComputer or FileExplorer;rightclickontherootdirectory,suchasC:or D:;clickon New;clickon NewFolder or Folder;andtypeinameaningfulname forthenewfolder,suchas Project
LearntoFindandSetAWorkingDirectoryforAProject
WhenweopenR,thedefaultinterfaceistheRConsolepage,whichisbased ontheinteractivemode.TocreateanRprogram,weshouldgototheREditor page.Todoso,wecanopenR,clickon File onthemenubar,andthenclick on Newscript toopentheRprogrameditor.Now,clickonthe Save buttonon themenubarorthe Save optionunder File,andwewillbepromptedtoentera filenameforanRprogramfilethatendswith.R.Foranexperiment,namethefile session1.R (remembertoenditwith .R),andthensavethefileinthe Project folder.
LearntoWriteandExecutetheSimplestToyProgram
NowisthetimeforustolearntowriteanextremelysimpleRprogramandrunit. Rhasadefaultworkingdirectoryorfolder(thinkofitasthepostofficeaddress formailandpackages).WeareinterestedintellingRtochangethecurrent defaultworkingdirectorytothe Project folder.Itislikedirectingourmailto
bedeliveredtoourownhomeaddress,ratherthanthepostofficeaddress.The Project folderiswherewekeepourprogramanddatafiles.
Todoso,wefirstidentifytheworkingdirectoryofthecurrentRsession,then changeittothe Project folder,andfinallyverifythatthechangeissuccessful. Intheprogrameditor,firsttypein
getwd()
The getwd functionliststhenameofthecurrentworkingdirectory. Next,typein
setwd("C:/Project")
The setwd functionchangesthecurrentworkingdirectoryforthecurrentR session.Theargumentofthefunctionisinsidetheparentheses,betweendouble quotationmarks,andemploysoneforwardslash;itspecifiesthepathtothe Project folderasthenewcurrentworkingdirectoryforthecurrentRsession. Thislineofcodemakesitpossible,duringtherestofourRsession,forusto refertothefileswithinthe Project folderwithoutspecifyingthepathagain. Finally,notethatRiscasesensitive.Hence,Rwilltreat Project and project astwodifferentfolders.Ifthereisamismatchinspellingbetweentheprogram codeandtheactualfoldername,Rwillproduceanerrormessage.Alsonotethat anymismatchintermsofquotationmarks,colon,etc.willcauseRtoproducean errormessage.
Inspecifyingthepath,wemayuseoneforwardslashasabove,oralternatively, twodoublebackslashesasfollows:
setwd("C:\\Project") Pleasenotethedoublebackslashes.Thisisveryimportant.Ifwecopythe pathfromourcomputer FileExplorer,thecopiedandpastedpathwillcontain onlyonebackslash.ForR,wewillneedtoaddanextrabackslash,orchangeit tooneforwardslash.
Finally,typeinagain
getwd()
Thisallowsustoverifythetaskisdoneasinstructed. Wesavethesethreelinesofcodeintoaprogramfilecalled session1.R.This three-lineRprogramasksRtodisplaythedefaultworkingdirectory,thensets the Project folderasournewcurrentworkingdirectory,andfinallyasksRto displaythecurrentworkingdirectoryagain.
Havingthe .R suffixintheprogramfilenameisagoodpracticefortwo reasons.IthelpsusseeimmediatelythatitisanRprogramfile.Whenweopen aprogramfileintheReditor,allfileswith .R suffixwillappearautomaticallyin thelistoffilesforustochoosetoopen.Iftheprogramfiledoesnothavethe .R suffixandifwewanttoopenitintheRprogrameditor,itwillnotshowup automaticallyinthelistoffiles.Wewillhavetochoose "AllFiles(*.*)" from filetypeinthelowerrightcornerinordertoseeallfilesinthefolder.
getwd()
setwd("C:/Project")
getwd()
ToexecutethislittleprograminR,wemaychooseoneofthefollowingtwo ways:
1.Ifwewanttoexecutetheprogramlinebyline,putthecursoranywhere inthatlineofcode,thenwecanexecuteitinoneofthreeways:(a)hit twokeys Ctrl+R onthekeyboardtogether;(b)rightclickthemouseand thenclickon Runlineorselection;(c)clickonthethirdlittleicon(right nexttothesecond savescript icon)ontheupperleftcorner,representing Runlineorselection
2.Ifwewanttoexecutethewholeprograminonerun,highlightthewhole programinReditor,andtheneitherrightclickthemouseandclickon Runlineorselection,orhittwokeys Ctrl+R onthekeyboardtogether.
Whenweexecutetheprogramabove,wewillgetthefollowingoutputinR:
getwd()
[1]"C:/Users/QuanLi/BoxSync/RBook/Rnw_oup_formal"
setwd("C:/Project") getwd()
[1]"C:/Project"
Notethatthefirstlineofcode getwd() showsthatthedefaultcurrentworkingdirectorywas "C:/Users/QuanLi/BoxSync/RBook/Rnw_oup_formal" ,in whichIhavekeptmyknitr.RnwandLaTexfilesforwritingthisRbook.Then,the secondlineofcodeasksRtosetthecurrentworkingdirectoryto "C:/Project" instead.ThethirdlineofcodeshowsthatfortherestofthisRsession,fileswill bedrawnfromorsavedtothisnewworkingdirectoryunlessotherwisespecified viaadifferentfilepath.
Oneessentialpointaboutprogrammingisthatoneshoulddocumentthe purposeofaprogramasawholeandwhateachlineofcodedoessothatdays, weeks,ormonthsfromnow,weoranyotherswhoopenuptheprogramwill beabletounderstandwhattheprogramdoesandhowitdoesit.Forthis purpose,weinsertcommentlinesthatbeginwiththe#signintoaprogram. The#signtellsRnottoexecutethatline.Notethatthefirstcomment specifiesthepurpose,time,andsoftwareversionused.Afteraddingcomment lines,thelittletoyRprogramabovewillnowbecompleteandlooklikethe following:
#FirstRtoyprogram,today'sdate,Rversion3.2.3
#showcurrentworkingdirectorypath getwd()
#changetheworkingdirectoryforprogramtoprojectfolder setwd("C:/Project")
#showcurrentworkingdirectorypathagaintoverify getwd()
Todemonstratethegeneralprocessgraphically,Figure1.1presentsfour screenshotsfromRconsoleandeditor,whichproceedfromopeningR(picture1), toopeningReditor(picture2),totypingthethreelinesofcodeinReditor (picture3),torunningthosethreelinesandproducingoutputinRconsole (picture4).
Thebiggestbenefitofwritingandsavingaprogramistoreproducethesame outputatanytimesolongasthefunctionsofthesoftwareremainunchanged. Forareaderunfamiliarwiththisapproachofworkingwithasoftwarepackage, itmightbeusefultocloseR,openitagain,andre-runthelittleprogramfor replicationandverification.Remembertosavetheprogramfirstbeforeclosing it;otherwise,wewillloseallthechangessincewelastsavedit.Theabilitytorun thesameprogramandproducethesameresultyearsafterisoneofthemost importantreasonswhyweprefertoprogram,ratherthanusingtheinteractive modeviapointandclick.
SinceRisanobject-orientedprogramminglanguage,itisusefultoknow somethingabouthowRworkswithdata.AsimplifiedviewisthatRreliesona varietyoffunctions,whichtakeindataasinputandthenproducedesiredoutput