Collaborating with Git

Back in the day, collaborating with Git meant emailing repositories around to each other, or hosting it on a server only accessible with the terminal. Today, we have Git hosting platforms, which are websites where you can upload a copy of your Git repository and work openly and collaboratively with others.

At some point, you are definitely going to want to put your work on a Git hosting platform. There are many platforms that host git repositories (like the one holding these materials, which is GitLab!); I’ve listed the most popular four below:

Table 1: A comparison of the four most popular git repo hosting platforms.
Name Manager Est. Free_Software Open_Source Private_Repos Ad_Free
GitLab GitLab B.V. 2011 Yes (partial on server) Yes Yes Yes
GitHub Microsoft 2008 No No Paid Yes
BitBucket Atlassian 2008 No No Yes Yes
SourceForge BizX LLC 1999 Yes No No No

Since we are using Git and Git doesn’t really care where a remote copy of your repository is, no matter where you choose to host your repositories, the commands are the same! So you can defer to your collaborators, or make the case for a hosting platform you feel strongly about. For this session, we’re using GitHub.

Create a Collaborative Repo

So everyone please pair up for the next bit! Both of you please log-in to GitHub. One of you, please create a new repository called hello-world in GitHub’s interface. It should look something like this:

Creating a new repository in GitHub

Check off “initialize with a README”. Everything should look like this before you hit the big green Create Repository button:

Creating a new repository in GitHub

Click Create Repository and you’ll be brought to a fresh repository. Go to the repository settings (click the Settings tab in your repository or go to: https://github.com/<username>/hello-world/settings/access) and then invite your partner to have access to the repository. Your partner will need to go to their email to accept the invitation.

Click the green ✅ if you have added your collaborator successfully or the red ❌ if you need help (and feel free to chat each other to help one another!).

Organizing in GitHub

I wanted to point out some neat features that GitHub has that can help you organize your group work, or even yourself (I use all these myself just to help me remember things!).

Issues

Issues help you keep track of the work happening on your project - they act much like a to-do list mixed with a discussion forum. In GitHub (and most GHPs), you can link to specific commit messages or pull requests, or even close issues with specific relevant commits.

I highly recommend that during collaborative work, some of that planning involves writing and posting issues. Issues typically have:

  • A title and description describe what the issue is all about.
  • Color-coded labels help you categorize and filter
  • A milestone acts like a container for issues (I use a conference name as a milestone for instance!)
  • Comments allow anyone with access to the repository to provide feedback (if your repo is public, anyone can comment)

When this is done and you all are ready to work, you then would assign yourself to the issue that you are currently working on. That way we all know what the other is doing. Issues can also be connected to pull requests, where collaborators can directly provide feedback on that work. I’ll go over that in a bit.

I love issues because they often contain some explanation of decisions over the course of a project and are a great way to coordinate with others.

Wikis

Every GitHub repository comes with a Wiki. You can disable that in the repository settings, however Wikis are useful for documentation. I use Wikis as lab notes and as a place to keep meeting minutes for projects that I am working on – mostly administrative stuff. Most of the time, developers use the Wiki as the main source of documentation about the software the repository contains. That can also be useful for you, as you’ll need to document your research along the way 😁

Projects

This is a Kanban-board like organization tool that is built into GitHub as well, sort of like Trello but with some GitHub specific actions and integrations. Folks use these to make roadmaps (to a first version of a software, or to a finished conference paper) and to keep track of a project (using the To Do, In Progress, Review, and Done column setup). From GitHub’s docs:

These boards issues, pull requests, and notes that are categorized as cards in columns of your choosing. You can drag and drop or use keyboard shortcuts to reorder cards within a column, move cards from column to column, and change the order of columns. Project board cards contain relevant metadata for issues and pull requests, like labels, assignees, the status, and who opened it. You can view and make lightweight edits to issues and pull requests within your project board by clicking on the issue or pull request’s title.

I also like that you can create notes in columns that can serve as reminders and general useful references (you can even create a reference card for another project board by adding a link to a note, which is really useful for working across repositories!).

I find the setup helpful, especially for folks that prefer to see all the GitHub issues listed out at once (which is kind of hard to do in the list view).

Downloading the repository

Ok, so now that we have a repository to work on and we sort of understand how GitHub can help us organize that collaboration, let’s work on our repository together!

To bring our central repository to our local computer, we can clone it. On GitHub, click the big green Code button and click HTTPS. Then copy the link that it gives you. Do not download ZIP.

GitHub’s download code button

After that, open the terminal, and navigate to your Desktop using (or anywhere that you will be able to find on the terminal) cd. After you are in a good place, clone the repository by pasting the link after the commands git clone like so:

git clone https://github.com/VickySteeves/hello-world.git

Click the green ✅ if you are doing ok or the red ❌ if you need help (and feel free to chat each other to help one another!).

After that finishes downloading our repository, cd in the newly created folder with:

cd hello-world/

CHALLENGE:

  1. Type ls -a in the terminal window
  2. Report back in the chat what you see!


Yes, you have all found the .git repository!! It is a hidden sub-folder in your project folder. You probably won’t have to touch this ever, but definitely don’t delete it or your history is GONE. Leave it hidden!

Now that we’ve cloned our central repository to our computers, let’s see how Git works locally. Here you can use a basic 2 step workflow to keep track of your changes with Git. Changes must first be added to the staging area, then committed from there. This two-stage process gives us a lot of control over what should and should not be included in a particular commit. This is the workflow you’ll use over and over again locally:

  1. git add filename.extension
  2. git commit -m 'super descriptive commit message'

These two commands, git add and git commit, are required to record all our local changes. These help us track a single file (e.g. git add README.md), a select group of files (e.g. git add *.csv), or everything (e.g. git add .) in the repository.

Version Local Files with Git

Ok so let’s make some changes to our repository so we can practice learning with Git. I’ve found that having a consistent folder organization scheme across a lab can help with knowledge transfer and being able to quickly get going on a new project. This type of templating is commonly useful:

  • Put text documents associated with the project in the doc folder.
  • Put raw data and metadata in the data folder. These data are read-only!
  • Files generated during cleanup and analysis in a results folder.
  • Put any code or scripts for the project in the src folder.

And while we’re at it:

  • Name all files to reflect their content or function, with NO special characters (!@#$%^*) or spaces! Use underscores or dashes, A-Z, and numbers!

So anyway, let’s make some of these folders using the terminal, and then put some text in them to add and commit with Git!

mkdir docs/ results/ data/ src/

Now use the ls command to see if that worked. You should see something like this in your terminal:

vicky@cagliostro:~/Downloads/hello-world$ ls
data  docs  README.md  results  src

Click the green ✅ if you are doing ok or the red ❌ if you need help (and feel free to chat each other to help one another!).

Make a branch

So now that we have our repository structure set, let’s create a file that we want to track with Git! We are working collaboratively, so that changes our workflow just a bit. We want to avoid merge conflicts at any cost because they are really annoying to deal with. So we are going to work on braches. A Git branch represents an independent line of development that will likely someday be merged back into main.

When you work collaboratively, you will work on a branch. The only thing you should commit directly to main are like typos in the README or fixing broken links. Otherwise, you should work on a branch, and then submit a pull request to the repository, even if you are a collaborator on it. Using this workflow helps to make sure your changes don’t conflict with the rest of the work in the repo.

So let’s make a new branch:

git checkout -b <any-short-name>

In your pair, decide who will create a branch about research ideas and who will create a branch adding some data. After you decide, each of you create a branch and do not name them the same.

This is what my output looks like from creating both of those branches (each of you PICK ONE AND ONLY ONE TO DO!):

vicky@cagliostro:~/Downloads/hello-world$ git checkout -b brainstorm
Switched to a new branch 'brainstorm'
vicky@cagliostro:~/Downloads/hello-world$ git checkout -b first-data
Switched to a new branch 'first-data'

Click the green ✅ if you are doing ok or the red ❌ if you need help (and feel free to chat each other to help one another!).

You can always go back to any branch by using the command git checkout <branch-name>.

Now that you each have a task to do (one brainstorm a research idea and the other is adding text data!), open up a plain text editor (Vim, Nano, Notepad, Notepad++, TextWrangler, Sublime Text…whichever plain text editor you want) and each one of you create a file:

  1. Open your plain text editor.
  2. Write out the beginning of a research idea or write out some qualitative data as if you were doing an interview. Just a few lines is fine, enough that we can use to tell the differences with.
  3. Save this file as ideas.txt in the docs folder you just created or as data.txt in the data folder. Again, each person in the pair will do one of these.
  4. Go back to the terminal.

Tracking the changes

Now let’s make sure we did that right by going to our terminal and typing in Vicky’s favorite Git command:

git  status

This status will tell us that Git has noticed a new file in our directory that we are not yet tracking – the directory where you each saved your text file should be red. We now want to tell Git that we want to track any changes to that directory with git add. This adds the folder and our txt file to the staging area (where Git checks for file changes). Type the following as separate commands:

$ git add <folder-name>
$ git status

The directory name should be green now, which is Git visually cueing us to the fact that there is a new file waiting for us to commit to it!

A commit records changes to the repository, and is assigned a unique hash that users can leverage for many purposes, like reviewing the history of changes to a repo! The messages we attach to our commits are therefore extremely important. Past us can’t answer emails about our code, and if you want to go through a timeline of development history that only says “updated code” … well chances are you’re not gonna have a good time.

A good commit message is concise, descriptive, and informative. Good commit messages start with verbs in the present tense and aim to be 50 characters or less (and try to avoid screaming!).

Commit messages from XKCD

Since we think it’s fine for a first draft, we can commit to our new version of the text file:

git commit -m 'short descriptive message about activities'

Click the green ✅ if you are doing ok or the red ❌ if you need help (and feel free to chat each other to help one another!).

Viewing and rewriting history

Another great part of Git is that it tracks that rich historical information. To review what you’ve been up to, type this in the terminal:

$ git log

This will list your commits with their IDs, date/time of creation, associated person, and commit messages. We should have 3 commits right now for our hello-world repository.

If you want to only look at the changes to a specific file, enter this command in the terminal:

$ git log filename.extension

This will list changes as before, but only those affecting this file, such as the one we just created! Remember that weird number from git log next to commits? This unique hash allows you to refer to that version, and you can use it to view the differences between files and repository states, as well as rewrite and overwrite your history.

Uploading changes to GitHub

At the moment our changes are only recorded locally, on our computer. If we wanted to work collaboratively with someone else they would have no way of seeing what we’ve done.

So to make sure that our collaborators can see and use our work, we will then have to upload, or push our local changes to the GitHub repository. We do this using the git push command:

$ git push -u origin <branch-name>

The nickname of our remote repository is origin (you can link of origin like a variable that holds the URL to our GitHub repository) and the default branch name is main. The -u flag tells Git to remember the parameters, so that next time we can simply run git push and Git will know what to do.

You may be prompted to enter your GitHub username and password to complete the command, and do so. Pay attention to the terminal as you may see a URL in there that we are going to use as a part of our collaboration. This is what my output looks like – you can see below a link to create a pull request on GitHub. By holding down CTRL, I can click that link. A browser window will open for me to make a pull request, which is exactly what we want to do!

vicky@cagliostro:~/Downloads/hello-world$ git push origin brainstorm 
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (4/4), 369 bytes | 369.00 KiB/s, done.
Total 4 (delta 0), reused 0 (delta 0)
remote: 
remote: Create a pull request for 'brainstorm' on GitHub by visiting:
remote:      https://github.com/VickySteeves/hello-world/pull/new/brainstorm
remote: 
To github.com:VickySteeves/hello-world.git
 * [new branch]      brainstorm -> brainstorm

So now you will have to fill in some information about the text that you have changed in your repository, whether it’s documentation, new data, or new code. You should replace whatever pre-filled title there is with a more descriptive one, and write out an explanation of what you did in the discussion box. You can apply labels to pull requests as well. When you have done all that, click the Create pull request button. A pull request is a like a ‘submission’ to a repository that is either accepted or closed by the maintainers (people who can push directly to the repository). It has a discussion feature, a review feature, and can connect to issues. It’s helpful to discuss specific work that is done or ongoing (you can have ‘draft’ PRs).

Click the green ✅ if you have made a PR or the red ❌ if you need help (and feel free to chat each other to help one another!).

Now, your collaborator will need to merge that in. Since you both worked on totally different parts of the repository, you should be able to merge those in without a problem! So, each of you merge in the other one’s pull request. To do that:

  1. Go to the Pull Requests (2) tab in your collaborative repository
  2. Click on the PR that you did not write.
  3. Click the Files tab in that PR to check out and verify your colleague’s work
  4. Go back to the Discussion tab in that PR and scroll down to click the big green Merge button

This brings me to a good best practice though – you should really not be merging in your own pull requests, again unless they are small (in which case honestly just push to main). Also, if you are working solo, you really don’t even need to push the changes to GitHub without first merging the branch into main (which is possible through Git in the terminal). That way you can work on branches and not mess up any code you know works on main and still not go through a weird self-submitting-and-accepting-PR cycle.

Staying in sync

So now we have a syncing problem – you have made a change in GitHub (2 PRs were merged in) that is not reflected locally on your computer. This might happen for a number of other reasons, like you might have to work on two different computers and stay synced between them (this happened to me a lot when I used go from my work computer to my home computer).

To download changes from a remote repository, we use the opposite of push ↪️ pull! Literally just type out:

$ git pull

CHALLENGE:

  1. Download the changes you made to the remote repository so you have them on your local computer.
  2. Use git log to inspect the changes.
  3. Click the green ✅ if you are doing ok or the red ❌ if you need help (and feel free to chat each other to help one another!).


So with git pull you can remain totally N*Sync between your local computer and GitHub!

Workflow overview

For collaborations:

  1. Make a repository on GitHub
  2. Give your collaborators access to that GitHub repository.
  3. Each of you write out issues for the project work
  4. Assign yourself an issue
  5. Clone the repo to your local computer.
  6. Create a branch, then work as normal adding and commiting files as you work
  7. Push your work to a pull request, which can be on draft mode if you are still working (so no one can merge it in prematurely)
  8. Make sure you also pull the repository frequently to make sure you are always working with the most up-to-date version of the repository there is.

For individuals:

  1. Make a repository on GitHub
    • you can also write issues or use the project boards for organization, though it’s more for your benefit than anyone else’s
  2. Clone it to your local computer.
  3. Work as normal, adding and commiting files as you work
  4. Push to GitHub minimum 1x/day but preferably like twice before and after lunch
  • If anyone wants to collaborate with you, they will need to fork your repository before making a pull request.

An overview of the Git-GitHub workflow

CONGRATS!!

Lastly…CONGRATS! In this part of the session, you all:

  1. Created a GitHub repository with a collaborator
  2. Cloned it to your local computer
  3. Made, added, and committed changes in that repository
  4. Synced your local work to a remote repository hosted on GitHub
  5. Synced remote work on GitHub to your local computer

You should all be very proud of yourselves!!