Hello! Please follow these instructions to download and install the necessary software and files. Note that it is not enough to have just R and tidyverse installed (which you might have) - you will also need a number of packages and a worksheet file, and your R should be up to date. I’d also recommend using RStudio instead of the plain R GUI. The instructions below come with troubleshooting steps - if something seems to be amiss or not working as intended, make sure you’ve read through everything.
Good luck!
- Andres Karjus
This document contains step-by-step instructions for:
All steps except 6 are mandatory and should be done before the workshop starts.
Importantly, if something went wrong and you could not install the software, please get in touch before the workshop starts so we can try to quickly troubleshoot. We cannot afford to waste any time on issues like installation during the workshop.
The installation process only takes a few clicks. But before you start, please make sure your operating system is up to date as well (particularly Macs: there are known conflicts between old versions of R and some newer packages, which will manifest if you have a Mac with an old version of the Mac OS, which in turn would lead you to download an old version of R).
First and foremost, you need R. If you already have R installed, please still update it to the most recent version, i.e. R version 4.0.2 (2020-06-22). Updating is done just by downloading the most recent installer and installing. Depending on your operating system, go to:
Download the installer and install (with default options, just keep clicking Next). Run R once to see that it works (in Windows, Rgui.exe should appear as a shortcut in the start menu and/or desktop; on a Mac, look for the R application in Finder). It should look something like this, depending on your OS:
Good job. Now close R (if it asks to save the workspace, say no). Once you get RStudio, there is no need to look at this ugly interface ever again.
While it is fine to use R from the command line or the bare-bones R interface application, we are going to use RStudio instead, which will make using R a lot easier and less of a hassle. It also has nice support for R Markdown, which we will be using.
Common issues and troubleshooting:
RStudio is an integrated development environment for R (which is why we had to install that first) - the Console panel on the left is basically the same thing that you saw when you ran “plain” R. But RStudio also features a number of very helpful features that will become apparent in the workshop. It comes with a handy script editor, which we are going to use right away.
Before we do that, we need to quickly change two options in RStudio to make it behave in a more useful way for us (fortunately, the RStudio interface is highly customisable).
Soft-wrap R source file
(this will make using the script editor much easier, by wrapping long lines so you won’t have to keep scrolling left and right all the time). See below for illustration.Show output inline for all R Markdown documents
(i.e. make sure the tick box is empty). This will disable notebook-style plot previews in the script editor and show plots in the Plots pane.Almost done! We need to make sure your RStudio and certain packages get along so we can use R Markdown and some more advanced plotting tools in the workshop. During this process, RStudio might need to download a few things - make sure you have internet access.
3.1. Copy-paste this bit of code into the R console in RStudio and press Enter. It should start downloading packages, indicated by some red text telling you it’s downloading from such and such url. Read though the rest of the steps while it’s doing that. Depending on internet and computer speed, it may take around 5 minutes: besides the packages named here, all their numerous dependencies will also be downloaded. The script will report success/failure in the end. Common issues and troubleshooting:
Do you want to install from sources the package which needs compilation? (Yes/no/cancel)
type or click no and press enter.p=c("rmarkdown","tidyverse","languageR","ggmosaic","patchwork","quanteda","rworldmap","maps","raster","rayshader","ggbeeswarm","ggridges","plotly","RColorBrewer","gapminder","igraph","ggraph","tidygraph","visNetwork","text2vec","scales","ggrepel"); install.packages(p); x=p%in%rownames(installed.packages());if(all(x)){print("All packages installed successfully!")}else{print(paste("Failed to install:", paste(p[!x]), ", try again and make sure you have internet connection."))};rm(x,p)
More troubleshooting:
Would you like to create a personal library...
- click yes. Alternatively, if on Windows: start RStudio as administrator (right-click the RStudio icon, choose run as administrator), this will allow it to install packages to the Program Files folder. Second alternative, install Rstudio in a location with write access, i.e. not in Program Files.RcppArmadillo
: ignore it.package ... is not available (for R version ...)
- seems you didn’t update R, see above.3.2. Now let’s make sure R Markdown works. Click on New File (either in the menu, or white button in the top left corner), and choose “R Markown…”. 3.3. At this point, RStudio might ask you to install some packages (although this should have been taken care of in the previous step). If so, just click “Yes” and wait for it to finish. 3.3. When this is done, a new window will appear, titled “New R Markdown…”. Just click “OK” to create the default document. 3.4. A new script file tab should appear in the script editor, probably titled “Untitled.Rmd”. It has some example contents. 3.5. Save the new .Rmd file (click on the little blue save icon, give it some name and save). 3.6. Now click on the little “Knit” icon (with the blue ball of yarn) on top of the script panel. A new window should appear, containing a simple webpage, titled “Untitled”, telling you that “This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML…”
Feel free to close this window now. If all this worked, great! If something did not work here, try restarting RStudio once and redoing the steps. If still no luck, find me beforehand so we can fix this.
You will need to ownload a script file for the workshop. Right-click (Mac ctrl-click) here and choose “Save link as…” (or “Download linked file”, or similar) and save the file.
Make sure that you save it as an .Rmd
file, not a .txt
file: the file on your computer should be named dataviz_worksheet.Rmd
.
Open it in RStudio (if RStudio is properly installed, double-clicking the file should do it). That’s it, you’re all set. Feel free to have a look around, but don’t be daunted by the number of exercises and amount of code - you’ll be guided gently through all of it, and some exercises are meant to be extra bonus stuff for those who already know R. If you’d like to have a look at the html version with the plots and all (which can easily be generated from the Rmd using knitr), click here (regarding the unfinished look of the plots - making them pretty is going to be your job).
Troubleshooting: if your script looks something like the top image, then you’re all good; if your script looks like the bottom image, then you somehow managed to save the script as a text file - RStudio won’t recognize it as a script, and you won’t be able to directly run code in the code blocks. Don’t worry - just go to the folder you saved it in, change the file type suffix to “.Rmd” and reopen it. Some operating systems hide the file type suffix; change this option if needed.
If you don’t know R (and particularly if you are also not intending to attend any other R-based workshops in the summer school on Monday or Tuesday), but would like to get a head start, I would highly recommending doing this 5-minute exercise before the workshop begins. It’s just a quick introduction to how basic R syntax works. If you’re new to programming, after doing this, you’ll feel much more comfortable tackling the more complex exercises in the workshop. If it’s been a while since you used R, feel free to do it as well as a refresher.
Below is a block of code. What you need to do is select and copy this block of code, and paste it into the RStudio script editor. First create a new script by clicking on the New icon in the top left corner of RStudio (or File->New File) and choose “R Script”.
A blank text file will open above the Console panel, titled something like Untitled1.R
.
Good job! Now copy-paste the code below (the gray box) into the new script file you just created, and follow the instructions therein.
# Start of the exercise. You should be reading this message in the RStudio script editor.
print("Hello! Put your text cursor on this line (click on the line). Anywhere on the line. Now press CTRL+ENTER (PC) or CMD+ENTER (Mac). Just do it.")
# The command above, when executed (what you just did), printed the text in the console below. Also, this here is a comment. Commented parts of the script (anything after a # ) are not executed.
# Also, if you've been scrolling left and right in the script window to read all this, turn on text wrapping ASAP: on the menu bar above, go to Tools -> Global Options -> Code (on the left) -> tick "Soft-wrap R source files". Better, right?
# So, print() is a function. Most functions look something like this:
# myfunction(inputs, parameters)
# All the inputs to the function go inside the ( ) brackets. In the above case, the text (all text, or "strings", must be within quotes) is the input to the print() function.
# Let's try another function, sum().
sum(1,10) # cursor on the line, press CTRL+ENTER (or CMD+ENTER on Mac)
# You should see the output (sum of 1 and 10) in the console. Note that you can also write commands directly in the console, and executing them with ENTER. Try some simple maths: write 2*3+1 in the console below, and press ENTER.
# Important: you can always get help for a function and check its input parameters by executing
help(sum) # put the name of any function in the brackets
# ...or by searching for the function by name in the Help tab on the right, or by selecting a function name in the code and pressing F1.
# Let's plot something. The command for plotting things is, surprisingly, plot().
plot(42, main = "Greatest plot in the world") # execute the command; a plot should appear on the right. If it appears inline in the script editor window, go back to the pre-workshop instructions and follow the steps for configuring R Markdown output.
# OK, this was not very exciting. But notice that a function can have multiple inputs, or arguments. In this case, the first argument is the data (a vector of length one), and the second is 'main', which specifies the main title of the plot.
# Note that you can make to plot bigger by pressing the 'Zoom' button above the plot panel on the right.
# Let's create some data to play with. We'll use the sample() command, which creates random numbers from a predifined sample. Basically it's like rolling a dice some n times, and recording the results.
sample(x = 1:6, size = 50, replace = T) # execute this; its output is 50 numbers
# If an output is not assigned to some object, it usually just gets printed in the console. It would be easier to work with data, if we saved it in an object. For this, we need to learn assignement, which in R works using the equals = symbol (or the <-, but let's stick with = for simplicity).
xdata = sample(x = 1:6, size = 50, replace = T) # what it means: xdata is the name of a (new) object, the equals sign (=) signifies assignement, with the object on the left and the data on the right. In this case, the data is the output of the sample() function. Instead of printing in the console, the output is assigned to the object.
xdata # execute to inspect: calling an object usually prints its contents into the console below.
# Let's plot:
plot(xdata) # plots data as it is ordered in the object
hist(xdata, breaks=20, main="Frequency of dice values") # plots a histogram (distribution of values)
# Since we already installed the plotly package, why not try it out:
library(plotly) # load the package
ydata = xdata+runif(length(xdata),-1,1) # some more random data
plot_ly(x=~xdata, y=~ydata, type="scatter", mode="markers", alpha=0.5, marker=list(size=10)) # creating interactive plots is easy! (also all the stuff beyond "y=~" is optional, only for making it prettier)
# Well done!! This is the end of the optional pre-workshop exercise. Congratulations, you have successfully installed R and RStudio, and have now learned the basics of programming (functions, assignement, parameters, input-output; as well as authoring basic webpages).
# You have also learned how to make simple plots in R. This will form the basis of the workshop, where we will learn how to recognize ways to plot different kinds of data (not just random dice rolls!), and how to modify the plotting parameters in order to create informative but also visually more pleasing graphs.
# Best of all, all the code is reusable in the sense that you can easily use the very same commands that we will be working with to plot your own data later on, just by changing the inputs.
# Also, the "programming" side of things will not get any more complicated than what you did in the last 10 minutes. That is literally all you need - but we will explore how to use these basic skills in various different ways.