## Welcome to bootcamp!

This bootcamp was originally desgined for my MD/PhD cohort at
UW-Madison. However, anyone can follow along! This curriculum is
designed for individuals new to both R and coding in general. NO PRIOR
CODING EXPERIENCE REQUIRED! **I have not created any materials presented
in this bootcamp.** Rather, I have put together a roadmap of resources
that I think will help students build the coding skills and statistical
knowledge needed to analyze a large, “high-dimensional” dataset.

Phase 1 consists entirely of CodeAcademy PRO’s R course, in addition to practice sets designed by Jenny Bryan for UBC’s STAT545 lecture. By the end of Phase 1 of this boot camp, you should be able to:

- manipulate a dataset into “tidy” format
- generate a variety of different graphs from a “tidy” dataset
- analyze trends over different groupings of data
- calculate mean/median/mode/IQR and perform basic statistical tests (t.tests, generalized linear models)

After getting your feet wet with data cleaning/data viz, we will move on to more complex concepts featured in the R Software Carpentry workshops that will make your coding life easier! By the end of phase 2, you should be able to:

- understand how to construct and interprete an if/else statement
- push/pull something from github

Phase 3 of this bootcamp will focus on providing an overview of statistics concepts needed to understand most computational approaches, particularly those in the genomics field. We will focus particularly on two-group comparisons, linear modeling, multiple testing and supervised/unsupervised learning. Phase 3 will refer to slidedecks prepared for UBC’s STAT540 course. By the end of phase 3, you should be able to:

- understand what quality checks should be performed on a dataset before conducting any analyses
- understand the statistical underpinnings of two-group comparisons and linear models
- know why multiple testing is important in “big data” studies
- perform PCA analysis
- understand supervised and unsupervised clustering methods

## Syllabus

### Phase 1

To complete this phase, you will need to sign up for CodeAcademy’s PRO R course. After completing each lesson, do the practice sets in your Google collab document. If a lesson does not have any associated practice sets, continue on to the next lesson. If you have any questions, just make a comment in the Google Collab document and tag me.

Lesson Title | More practice | Practice answers | Due date |
---|---|---|---|

Introduction to R Syntax | June 1st | ||

Learn R: Data Frames | June 1st | ||

Data Cleaning | June 1st | ||

Data Visualization with ggplot2 | June 1st | ||

Logical operators from DataCamp | HW1, HW2 | HW1 answers, HW2 answers | June 1st |

Aggregates | HW3 | HW3 answers | June 1st |

Joining Tables | HW4 | Link coming soon! | June 1st |

Mean, Median and Mode | June 1st | ||

Variance and Standard Deviation | June 1st | ||

Quartiles, Quantiles and IQR | June 1st | ||

Hypothesis testing | HW5 | Link coming soon! | June 1st |

### Phase 2

Make sure you have RStudio up and running on your own machine. Instructions for how to do this should have been provided in Lesson 1 of CodeAcademy, but if you didn’t get a chance to set that up, there are instructions here.

Lesson Title | Source | Due date |
---|---|---|

If/else statements | Software Carpentry | June 15th |

Natural Language Processing | TBD | June 15th |

Tenents of writing good code in R | Software Carpentry | June 15th |

### Phase 3

This phase will focus on understanding conceptual topics foundational to genomics-based/statistical research. The slide decks below are referenced from UBC’s STAT540 course. They are used with permission from Sara Mostafavi (course director).

Lesson Title | Source | Date |
---|---|---|

Exploratory data analysis and quality control | STAT540 | TBD |

Stats/math background for big data | STAT540 | TBD |

Two group comparisons | STAT540 | TBD |

ANOVA | STAT540 | TBD |

Linear models | STAT540 | TBD |

Linear modeling with continuous variables | STAT540 | TBD |

Multiple testing | STAT540 | TBD |

Principal component analysis | STAT540 | TBD |

Cluster analysis | STAT540 | TBD |

Supervised learning | STAT540 | TBD |

Supervised learning part 2 | STAT540 | TBD |