This course is offered in collaboration with the Graduate School of Production Ecology and Resource Conservation of The Netherlands. Registrations are open for everyone. If you are a PhD student, postdoc or staff of any of the affiliated Graduate Schools you might be eligible for a discount, more details to be available once the course is announced.
Scope of the course
It is often mentioned that 80% of a data analysis pipeline is involved with the tedious process of cleaning and preparing data in a correct way so they can be consumed for analysis and visualization (Dasu & Johnson, 2003).
Tidy data facilitates easier data transformation and visualization. Tidy data works hand in hand with the tools provided by the tidyverse collection of R packages, in a way that promotes reproducibility and efficiency. ggplot2 (Wickham, 2009) is one of the core members of the tidyverse. It is one of the best and most used R packages for data visualization. In this workshop, participants will learn the principle of tidy data, how to transform and combine datasets using the tools from the tidyverse and how to generate advanced visualization with the ggplot2 package.
Dasu, T., & Johnson, T. (2003). Exploratory Data Mining and Data Cleaning. https://doi.org/10.1002/0471448354
Wickham, H. (2009). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. Retrieved from http://ggplot2.org
Is this workshop for me?
- Do you routinely spend long days transforming and cleaning your Excel files to get them ready for analysis and/or making plots?
- Do you work with tricky datafiles (measurements from different type of equipment, normally provided as raw text files)?
- Do you often struggle to make sense of big, complex datasets?
- Do you often have to combine different datasets in order to perform your research?
- Do you want to communicate your findings in a beautiful and reproducible way by generating publication-ready plots?
Then this workshop will equip you with the skills to tackle the above use-cases and many more!
Participants should be familiar with the concepts taught in the course “Introduction to R and RStudio” and be comfortable in working with:
- Vectors, Lists and Data.Frames
- Importing and saving data
- Using functions
- Participants will be taught the principles of tidy data, how to best structure their data using the tools of the tidyverse and the concept of data analysis pipelines.
- Participants will learn the basics concepts of relational data and how to combine different datasets in a reproducible and efficient manner.
- Participants will learn the syntax and philosophy of the grammar of graphics as implemented by the ggplot2 package.
- Participants will learn how to make different types of visualizations using the ggplot2 package. They will be able to explore their own data sets using scatter plots, boxplots, bar charts, smooth fitted lines in scatter plots, etc.
- Participants will learn how to customize their figures to achieve publication-level quality, by adjusting the labels, legends, colors, and coordinate systems, among others.
- Participants will be introduced to interactive data visualization and visualization of geographical data using interfaces in R to state-of-the-art web technologies
- The workshop consist of interacting presentations that are often interrupted by short do it yourself periods during which the participants need to solve exercises of increasing complexity. The participants will spend at least half of the time on writing R code and thinking about data science problems.
- In order to practice all the skills taught in the workshop, participants will have to solve a small project on the last day using a dataset from a real research project. The different challenges faced by the participants will then be discuss at the end of the course as a wrap-up.
The course is spread across four days in two consecutive weeks and takes place on a dedicated Microsoft Teams group that will be created for the course. Each day of the course will be broken into three sections by a lunch break (1 hour) and two shorter breaks (20 minutes). Each of the first two sections of a course day will be chaired by one of the instructors who will share his computer screen via with the rest of participants. During these sections, theoretical concepts will be taught via a presentation, mixed with live coding practice (i.e. the instructor writes the code to solve a problem), interaction with participants as well as short exercises (2 – 3 minutes each) performed by participants on their own. The other instructor will be answering questions on the chat of the Teams group
COURSE DATES ARE NOT SET YET, PLEASE REGISTER YOUR INTEREST IN THE FORM BELOW