Chapter 4 git with RStudio and GitLab
Why do we care about version control?
From Wikipedia: “the management of changes to documents, computer programs, large web sites, and other collections of information.” Basically, it’s a way for us to compare, restore, and merge changes to our stuff. We want to avoid this:
4.1 What is git?
Git is a revision control system. The purpose of git is to manage a project, or a set of files, as it changes over time. Git stores this information in a data structure called a repository. A Git repository contains, among other things, the following:
- Snapshots of your files (text, images, whatever file that isn’t binary)
- References to these snapshots, called heads
The git repository is a hidden sub-folder in your project folder, called .git
. You probably won’t have to touch this ever, but definitely don’t delete it.
4.1.1 How git does version control
Git works on branches, which represent independent lines of development. Each snapshot is linked to a ‘parent’ one it built upon. By default, everyone’s repositories are on a “master” branch. A few good tutorials on branches can be found on House of Hades and the Atlassian guides.
There are three states that your git project can be in:
You are just working normally in your working directory.
You want to stage your work, so git knows it could potentially become the next version.
Your changes become the newest version in the repository!
As you work, you move between these three states many, many times throughout the life of a project. These are done with some simple commands in the terminal, OR in RStudio!
4.2 Configure Git & Git with RStudio
Before being able to integrate git and R, you first need to configure git with your name and email address. In a project, everyone needs to see what exactly other collaborators have been doing. In a version control system like git, this is done through two commands run in the Terminal.
On Windows, you can search cmd
to get to the terminal, and on Mac, you can search Terminal
in the spotlight search. You should see a small black window show up. Type:
git config --global user.name 'Your Name'
And substitute ‘Your Name’ for your given name and your family name. Hit enter when after you’ve typed the full line. Next, type the following:
git config --global user.email 'your@email.com'
Substituting ‘your@email.com’ with your email address. It should look like this:
After making sure git is set up correctly, you can then configure RStudio to use git!
- Open RStudio
- Click Tools -> Global Options -> Git/SVN
- You should be able to see that git has a program associated with ti. If Git executable shows ‘(none)’, click Browse and select the git executable installed on your system.
- On a Mac, this will likely be one of the following:
/usr/bin/git
,/usr/local/bin/git
, or/usr/local/git/bin/git
- On Windows,
git.exe
will likely be somewhere inProgram Files
orProgram Files (x64)
.
- On a Mac, this will likely be one of the following:
- Click OK
- Restart RStudio
4.3 Working with Git in RStudio
4.3.1 Adding Git to a project
Version control in RStudio can only be done on the project level. To use git with RStudio, you need to either add git to an existing project or start a new project with git enabled from the start. To add git to a new project in RStudio, all you need to do is check a box!
- Open RStudio
- Click File -> New Project -> New Directory -> Empty Project
- Check Create a git repository for this project
To add git to an existing project in RStudio:
- Open your project in RStudio (click File -> Open Project)
- Click Tools -> Project Options
- In Project Options, click the Git/SVN tab.
- Change the “Version Control System” from “None” to “Git”
- [Optional] Add a link to the remote repository.
4.3.2 Working with Git in your project
So, just by virtue of doing your normal work within a git repository, you are in the working directory state. Say you want to tell git about some changes you’ve made to your files. You need to add it! Here’s what it looks like in RStudio when you have files that are untracked:
You can see here that there is a bright yellow ?
next to the files that are untracked, and a green A
next to the files that have been added. To add those two untracked images, just double click the question mark or check the check box! Then we’ll have it all staged.
Git also lets you choose which parts of the files you want to commit. Say you’re working on some analysis notebooks. One is done, but the other is unfinished. You’d like to make a commit and go home (5 o’clock, finally!) but wouldn’t like to commit the parts of the second notebook, which is not done yet. You stage the parts you know belong to the first notebook, and commit.
All that to say – you commit your changes after you finish adding everything you want to for the moment. When you commit a file, you are telling git that this is the new version of a file. To make a commit in RStudio you must:
- Click the Git tab
- Check Staged next to the files you’ve added
- Click Commit
- Type a message in Commit message
- this is a message to future you, and future you doesn’t respond to email! Make this descriptive and concise!
- Click Commit
You can see what has changed in a given file since its last commit in this window as well:
Git has recorded a complete history of your work. To see all the changes for the project, you just go to the Git tab and click the History
button!
Sometimes we make a change that doesn’t work so well in the end. In the event of errors or inconsistencies into your work, you can browse through your history, find the change that’s to blame, and restore your previous good work. It might be a straight-up error, or you decide that what you wrote isn’t the best way to do something. In this case, we’ll need to revert your change!
- Go to the Git tab in RStudio
- Click Diff
- Select a file or a lot of files, view the differences, etc.
Click Revert
Confirm you actually want to revert your change
The erroneous change has been undone and the previous version restored!
4.4 GitLab (with R!)
Now that we know some git, we can use git repository hosting platforms for collaboration and open science! One of the very best is GitLab.
There are many features that set GitLab apart from other services. It has free and unlimited public and private repositories. It has continuous integration built-in, and you can use either the built-in docker registry or an image from DockerHub for each repository, no configuration required (simply call the container from the continuous integration!). GitLab also offers free LFS, so we can share larger files within a repository. Another big plus – integrates with a lot of great tools and services, like JIRA, Kubernetes, and the Open Science Framework.
Assuming everyone has a GitLab account (if not, make one here quickly. When you are logged into gitlab.com, you should be able to see a +
sign in the top right-hand corner. This will let you create a new empty repository! You can choose the permission level of the repository – 100% private, internal (private but visible to folks logged into GitLab), or 100% public.
We can add any GitLab repository easily when we start a new RStudio project.
- Go to File > New Project > Version Control
- Choose Git from the dropdown menu
- In the “repository URL” paste the URL of your new GitLab repository. It will be something like this https://gitlab.com/VickySteeves/hello-world.git.
This means that we can sync our local changes to a repository hosted on Gitlab! No, we have to PUSH all our locally created content to the origin remote. This adds 1 more step to what you already know how to do:
- Work on your files
- Add your files so git knows you want to track their changes
- Commit any changes you want to make the new version
- Send these changes to the repository hosted on GitLab by simply clicking the
push
button on the Git panel.
Go refresh your browser to see your changes!
We can keep a copy of our code locally and in this central repository on GitLab. This helps us make sure our code isn’t only stored in one place (our laptops) at any given time. But it also lets us collaborate on code with our colleagues and also strangers!
For our colleagues, we can add them as collaborators within our repository with varying levels of permission - we can even give them an expiration date, if their term on a project ends on a certain date!
For everyone else who we don’t want to give direct push access to a repository, they must fork our repository and submit a merge request to get their code integrated into ours!
Forks & Merge Requests
A fork is a copy of a repository in your namespace (under your account). Forking a repository allows you to freely experiment with changes without affecting the original project.
A merge request is when you want to integrate the changes you made into the original repository you forked. You describe the changes you made and make sure your changes don’t conflict with the original repo’s code.
The first step to contributing to a code repository where you don’t have push access is to fork it. GitLab has made this as easy as a button click:
You can then choose where you want to put the new repository – into your own account, or a group account!
You can edit, push, pull, add, commit, everything the same as your own code, since it is under your account now!
When you want your changes to be integrated into the official/original repository, you make a merge request! This too, GitLab has made easy for us. Click the ‘Merge Request’ tab on the GitLab sidebar. From there, it’s a simple button click to start your merge request:
then, GitLab will show you all the changes made and the differences between the code in each repository. You can compare to make sure you don’t have any conflicts! Then, you’ll have to describe all the changes you’ve made to the code:
The last step is simply to submit the merge request and await feedback!
4.5 Do more with GitLab
We can do more than just version our code with GitLab. We can also create websites using GitLab pages, host large files with free git LFS, or manage projects with issues and boards!
Since we’re talking about publishing geoscience papers, we take a closer look at how we can use GitLab and GitLab’s continuous integration features to render R Markdown files in-browser in the Section 5.5.
4.6 Further reading
Git:
- Pro Git book: The entire Pro Git book, written by Scott Chacon and Ben Straub and published by Apress (available in many languages!).
- TryGit: enter git commands in-browser to help reaffirm beginner git skills!
- Git: The Simple Guide: step-by-step Git tutorial.
- Think Like A Git: for someone who’s been using Git, but doesn’t feel they really understand it.
GitLab:
- GitLab Community Forums
- GitLab Documentation
- GitLab YouTube Channel: they put a lot of tutorials up on here!
- GitLab CI for R: some common recipes for using GitLab CI for R projects.
GitLab & R:
- Happy Git and GitHub for the useR: course material (under development) by Jenny Bryan
- gitlabR library: access the gitlab API right from R! Provides R functions to access the API of the project and repository management web application gitlab. For many common tasks (repository file access, issue assignment and status, commenting) convenience wrappers are provided, and in addition the full API can be used by specifying request locations.
- Migrating from GitHub to GitLab with RStudio: R-Bloggers post on moving your projects from GitHub to GitLab.
- Handy git functions from
usethis