Chapter 4 git with RStudio and GitLab

Why do we care about version control?

From Wikipedia: “the management of changes to documents, computer programs, large web sites, and other collections of information.” Basically, it’s a way for us to compare, restore, and merge changes to our stuff. We want to avoid this:

PhD comics – a tale of many versions

PhD comics – a tale of many versions

4.1 What is git?

Git is a revision control system. The purpose of git is to manage a project, or a set of files, as it changes over time. Git stores this information in a data structure called a repository. A Git repository contains, among other things, the following:

  • Snapshots of your files (text, images, whatever file that isn’t binary)
  • References to these snapshots, called heads

The git repository is a hidden sub-folder in your project folder, called .git. You probably won’t have to touch this ever, but definitely don’t delete it.

4.1.1 How git does version control

Git works on branches, which represent independent lines of development. Each snapshot is linked to a ‘parent’ one it built upon. By default, everyone’s repositories are on a “master” branch. A few good tutorials on branches can be found on House of Hades and the Atlassian guides.

There are three states that your git project can be in:

  1. You are just working normally in your working directory. Git stages from https://git-scm.com/about

  2. You want to stage your work, so git knows it could potentially become the next version. Git stages from https://git-scm.com/about

  3. Your changes become the newest version in the repository! Git stages from https://git-scm.com/about

As you work, you move between these three states many, many times throughout the life of a project. These are done with some simple commands in the terminal, OR in RStudio! Git stages from https://git-scm.com/about

4.2 Configure Git & Git with RStudio

Before being able to integrate git and R, you first need to configure git with your name and email address. In a project, everyone needs to see what exactly other collaborators have been doing. In a version control system like git, this is done through two commands run in the Terminal.

On Windows, you can search cmd to get to the terminal, and on Mac, you can search Terminal in the spotlight search. You should see a small black window show up. Type:

git config --global user.name 'Your Name'

And substitute ‘Your Name’ for your given name and your family name. Hit enter when after you’ve typed the full line. Next, type the following:

git config --global user.email 'your@email.com'

Substituting ‘your@email.com’ with your email address. It should look like this:

Configure git in the terminal

Configure git in the terminal

After making sure git is set up correctly, you can then configure RStudio to use git!

  1. Open RStudio
  2. Click Tools -> Global Options -> Git/SVN
  3. You should be able to see that git has a program associated with ti. If Git executable shows ‘(none)’, click Browse and select the git executable installed on your system.
    • On a Mac, this will likely be one of the following: /usr/bin/git, /usr/local/bin/git, or /usr/local/git/bin/git
    • On Windows, git.exe will likely be somewhere in Program Files or Program Files (x64).
  4. Click OK
  5. Restart RStudio
Configuring git in RStudio

Configuring git in RStudio

4.3 Working with Git in RStudio

4.3.1 Adding Git to a project

Version control in RStudio can only be done on the project level. To use git with RStudio, you need to either add git to an existing project or start a new project with git enabled from the start. To add git to a new project in RStudio, all you need to do is check a box!

  1. Open RStudio
  2. Click File -> New Project -> New Directory -> Empty Project
    • Check Create a git repository for this project
Adding git a new RStudio project

Adding git a new RStudio project

To add git to an existing project in RStudio:

  1. Open your project in RStudio (click File -> Open Project)
  2. Click Tools -> Project Options
  3. In Project Options, click the Git/SVN tab.
  4. Change the “Version Control System” from “None” to “Git”
  5. [Optional] Add a link to the remote repository.
Adding git an existing RStudio project

Adding git an existing RStudio project

4.3.2 Working with Git in your project

So, just by virtue of doing your normal work within a git repository, you are in the working directory state. Say you want to tell git about some changes you’ve made to your files. You need to add it! Here’s what it looks like in RStudio when you have files that are untracked:

Unstaged files in RStudio

Unstaged files in RStudio

You can see here that there is a bright yellow ? next to the files that are untracked, and a green A next to the files that have been added. To add those two untracked images, just double click the question mark or check the check box! Then we’ll have it all staged.

Adding files in RStudio

Adding files in RStudio

Git also lets you choose which parts of the files you want to commit. Say you’re working on some analysis notebooks. One is done, but the other is unfinished. You’d like to make a commit and go home (5 o’clock, finally!) but wouldn’t like to commit the parts of the second notebook, which is not done yet. You stage the parts you know belong to the first notebook, and commit.

All that to say – you commit your changes after you finish adding everything you want to for the moment. When you commit a file, you are telling git that this is the new version of a file. To make a commit in RStudio you must:

  • Click the Git tab
  • Check Staged next to the files you’ve added
  • Click Commit
  • Type a message in Commit message
    • this is a message to future you, and future you doesn’t respond to email! Make this descriptive and concise! Don’t be this person. https://xkcd.com/1296/
  • Click Commit
Commit message in RStudio

Commit message in RStudio

You can see what has changed in a given file since its last commit in this window as well:

Full commit screen in RStudio

Full commit screen in RStudio

Git has recorded a complete history of your work. To see all the changes for the project, you just go to the Git tab and click the History button!

Git history in RStudio

Git history in RStudio

Sometimes we make a change that doesn’t work so well in the end. In the event of errors or inconsistencies into your work, you can browse through your history, find the change that’s to blame, and restore your previous good work. It might be a straight-up error, or you decide that what you wrote isn’t the best way to do something. In this case, we’ll need to revert your change!

  • Go to the Git tab in RStudio
  • Click Diff
  • Select a file or a lot of files, view the differences, etc.
  • Click Revert Revert window in RStudio

  • Confirm you actually want to revert your change Confirm your revert in RStudio

The erroneous change has been undone and the previous version restored! After a revert in RStudio

4.4 GitLab (with R!)

Now that we know some git, we can use git repository hosting platforms for collaboration and open science! One of the very best is GitLab.

There are many features that set GitLab apart from other services. It has free and unlimited public and private repositories. It has continuous integration built-in, and you can use either the built-in docker registry or an image from DockerHub for each repository, no configuration required (simply call the container from the continuous integration!). GitLab also offers free LFS, so we can share larger files within a repository. Another big plus – integrates with a lot of great tools and services, like JIRA, Kubernetes, and the Open Science Framework.

Assuming everyone has a GitLab account (if not, make one here quickly. When you are logged into gitlab.com, you should be able to see a + sign in the top right-hand corner. This will let you create a new empty repository! You can choose the permission level of the repository – 100% private, internal (private but visible to folks logged into GitLab), or 100% public.

New GitLab repository

New GitLab repository

We can add any GitLab repository easily when we start a new RStudio project.

  1. Go to File > New Project > Version Control
  2. Choose Git from the dropdown menu
  3. In the “repository URL” paste the URL of your new GitLab repository. It will be something like this https://gitlab.com/VickySteeves/hello-world.git.
Adding git a new RStudio project

Adding git a new RStudio project

This means that we can sync our local changes to a repository hosted on Gitlab! No, we have to PUSH all our locally created content to the origin remote. This adds 1 more step to what you already know how to do:

  1. Work on your files
  2. Add your files so git knows you want to track their changes
  3. Commit any changes you want to make the new version
  4. Send these changes to the repository hosted on GitLab by simply clicking the push button on the Git panel.
Pushing to GitLab from RStudio

Pushing to GitLab from RStudio

Go refresh your browser to see your changes!

We can keep a copy of our code locally and in this central repository on GitLab. This helps us make sure our code isn’t only stored in one place (our laptops) at any given time. But it also lets us collaborate on code with our colleagues and also strangers!

For our colleagues, we can add them as collaborators within our repository with varying levels of permission - we can even give them an expiration date, if their term on a project ends on a certain date! Looking at our collaborators in GitLab

For everyone else who we don’t want to give direct push access to a repository, they must fork our repository and submit a merge request to get their code integrated into ours!

Forks & Merge Requests

A fork is a copy of a repository in your namespace (under your account). Forking a repository allows you to freely experiment with changes without affecting the original project.

A merge request is when you want to integrate the changes you made into the original repository you forked. You describe the changes you made and make sure your changes don’t conflict with the original repo’s code.

The first step to contributing to a code repository where you don’t have push access is to fork it. GitLab has made this as easy as a button click: Forking a repository in GitLab

You can then choose where you want to put the new repository – into your own account, or a group account!

Forking a repository in GitLab

Forking a repository in GitLab

You can edit, push, pull, add, commit, everything the same as your own code, since it is under your account now!

Forking a repository in GitLab

Forking a repository in GitLab

When you want your changes to be integrated into the official/original repository, you make a merge request! This too, GitLab has made easy for us. Click the ‘Merge Request’ tab on the GitLab sidebar. From there, it’s a simple button click to start your merge request: Starting your merge request in GitLab

then, GitLab will show you all the changes made and the differences between the code in each repository. You can compare to make sure you don’t have any conflicts! Then, you’ll have to describe all the changes you’ve made to the code: Forking a repository in GitLab

The last step is simply to submit the merge request and await feedback!

4.5 Do more with GitLab

We can do more than just version our code with GitLab. We can also create websites using GitLab pages, host large files with free git LFS, or manage projects with issues and boards!

Since we’re talking about publishing geoscience papers, we take a closer look at how we can use GitLab and GitLab’s continuous integration features to render R Markdown files in-browser in the Section 5.5.

4.6 Further reading

Git:

  • Pro Git book: The entire Pro Git book, written by Scott Chacon and Ben Straub and published by Apress (available in many languages!).
  • TryGit: enter git commands in-browser to help reaffirm beginner git skills!
  • Git: The Simple Guide: step-by-step Git tutorial.
  • Think Like A Git: for someone who’s been using Git, but doesn’t feel they really understand it.

GitLab:

GitLab & R: