LeaRning data science in R course sessions

Page 1

Intro to R for Exploratory Data Analysis Hugo Tavares 2019-10-21 This document gives the preliminary schedule for our weekly R sessions. This is subject to change, depending on the pace of the class. All the materials will be provided online. The link to the course materials and instructions on how to setup your computer will be given closer to the date. While it is not compulsory to attend all sessions, it is highly advised to attend as many as you can, even if you think that you might know some of the materials already. This is because we will not just learn R itself, but also (and importantly) about fundamental principles of exploratory data analysis. In that sense, R will be taught as a tool that gives us the freedom to ask a range of questions from our data in a reproducible manner. Also, having weekly modules will allow you to practice and consolidate learned concepts between each session (I will provide some exercises/questions that you can optionally think about at home). And we can discuss difficulties that you encounter through the sessions, to try and achieve a clear understanding of the tools we cover. Session 1: This session is about getting started with working with data in R + RStudio: • • • • •

how to setup and organise an analysis project using functions and access their documentation import tabular data into R understand and distinguish between the fundamental data structures and data types in R discuss the importance of and apply basic quality control checks of data

Session 2: This session is all about the visual display of information: • • • •

recognise the necessary elements to build a plot using the ggplot2 package use ggplot2 to produce several kinds of visualisations (for continuous and/or discrete data) distinguish which types of visualisation are adequate for different types of data and questions discuss the importance of scales when analysing and/or visualising data

Session 3: This session is about starting to manipulate tables and cleaning up data: • use of the dplyr package to manipulate tabular data (add or modify variables, select and rename columns) • understand and use “pipes” as a way to build a chain of operations on data • discuss some common issues with data cleaning and use functions from the stringr package to help solve them • discuss how to deal with missing data

1


Session 4: This session is about subseting data based on specific questions or conditions: • introduction to logical comparisons and logical data type • remember and distinguish between different types of logical operators • choose the right conditional operations to obtain specific observations from data using the filter() function • apply conditional operations to clean data mistakes and/or highlight elements of interest in graphs using the ifelse() function • how to save processed data into a file Session 5: This session is about working on sub-groups of data: • • • • •

recognise when to use grouped operations in data analysis differentiate between grouped summaries and other types of grouped operations apply grouped summaries using the group_by() + summarise() functions apply other grouped operations such as: group_by() + filter() and group_by() + mutate() recognise the importance of the ungroup() function

Session 6: This session is about combining different datasets together and how to modify the “shape” of our data: • • • •

understand and apply functions that reshape data from “long” to “wide” formats appreciate when each format is useful and which one to choose for the intended tasks join different datasets together distinguish between different ways to join data, and understand which one to choose depending on the desired outcome

2


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.