7  Version control with Git

Git is a version control system that tracks changes in computer files and stores multiple versions of those files. The lab uses Git to maintain version controlled documents, data, and code. For shared projects, the version controlled repositories are hosted on GitHub in private and public repositories.

There are lots of Git resources including:

7.1 Installing Git

You will install Git in different ways depending on your operating system.

Linux (Ubuntu)

In a terminal, type sudo apt install git.

MacOS

Go to https://git-scm.com/download/mac to download and install Git.

Windows

Go to https://git-scm.com/download/win to download and install Git.

7.2 Ignoring files and directories

Git will automatically use version control on all of the files that are in the repo. But you can give it a list of files or directories that you want it to ignore and not track.

  • Create a .gitignore file in your repository directory.
  • Add directory names (e.g., data) and/or file types (e.g., *.log)—one per line.
  • Find default ignore files at https://gitignore.io.

7.3 Using to GitHub

Using GitHub is fairly straightforward, and there are lots of help resources on GitHub’s website.

Create a GitHub account

  • Go to https://github.com and click on Sign up.
  • I highly recommend enabling two-factor authentication for your GitHub account.
  • Send me your username when you have created an account, and I will add you to the shared lab group.

Connect your repo to your account

There are two ways to connect securely to GitHub. The first (and easiest) is to use HTTPS. This requires no special work on your part, but you must create and use a GitHub personal access token that is stored on your computer. This is the preferred method for interfacing with GitHub. The second is to use SSH, but this is not recommended unless you really know what you are doing.

This information is drawn from GitHub’s personal access token help page.

Personal access tokens (PATs) are an alternative to using passwords for authentication to GitHub when using the GitHub API or the command line. Create a PAT by going to Settings > Developer settings > Personal access tokens. Generate a new token, name it, optionally set an expiration date and limit the scope. Save the generated token in your password manager (e.g., LastPass).

When you log into the GitHub website, use your GitHub password. When you are interfacing with GitHub through GitHub Desktop, Git, RStudio, etc., use the PAT. Instead of having to log into GitHub with your PAT every time you push or pull, you can store your GitHub credentials/PAT in Git by using the GitHub CLI or Git Credential Manager. If you using Linux, you can manage GitHub credentials from R.

7.4 Using Git commands

Git is command-line driven software, so it is useful to know how to use a command line for your operating system. To open a command line terminal in Windows, click Start, type cmd, and select Command Prompt. For Macs, open Spotlight, type terminal, and select Terminal. For Linux, type Ctrl-Alt-T. Next, if you’re not familiar, you need to learn about working from the command line.

Create a local repository (repo)

  • Open a terminal.
  • Change directories to the directory in which you want to create the directory (e.g., cd projects/this_project).
  • Type git init. This creates a hidden directory on your computer call .git that stores all of the Git files. You have to be able to view hidden files in your operating system (Windows and MacOS) to see this folder.

Clone a remote (GitHub) repository locally

To clone a repo means to copy it from GitHub to your computer. Cloning will create a directory for you, so you don’t need to create a project-specific directory before cloning. To clone the repo, change directories to the main directory where you want the new repo to be and type:

git clone [insert SSH/URL info from remote repository]

Add a local repository to GitHub

If you have already created a repo locally and want to put it on GitHub, you first need to create a new, empty repo in GitHub. Then, copy the URL (either HTTPS or SSH) from GitHub and type:

git remote add origin [insert GitHub repo URL]

Update GitHub from local repository

After you’ve put your repo on GitHub, you’ll want to update it when you make changes locally. To do this, you need to push the local changes to the remote repo by typing:

git push -u origin main

The term origin refers to your local version, and the main refers to the remote branch.

Update local repository from the server

If changes are made on the remote repo, you’ll want to download them to your local repo. For this, we pull.

git pull

Preserve current state of local repository (commit)

Version control saves the current state of a project (or subset of files). To do this, you first need to ‘stage’ the changes, which just means select the files that you want to save. After staging, you must ‘commit’ the files to actually save the files that you’ve staged. For each commit, you should include a message that describes what that commit does.

  • Stage all changes with git add . and individual files with git add [insert path/filename].
  • Commit changes with git commit -m "[Insert message here]".
  • There is no set rule on when/what to commit, but it is useful to commit fairly frequently, and different file changes can be added to different commits.
  • Commit messages should be active declarations of what changes are in the commit. They should almost always start with a present tense imperative verb (e.g., “Add Cronbach’s alpha analysis”, “Remove redundant plots”, “Replace frequentist t-test with Bayesian t-test”). Additional details can be given if committing in RStudio or other GUIs, but command line commit messages should be short. It takes a bit of practice to learn How to Write a Good Git Commit Message.

View remote URL

To see the remote URL for a particular repo, type:

git remote -v

Change remote URL

To change the remote URL for a repo, type:

git remote set-url origin [Insert URL here]

Force pull to overwrite local changes

Sometimes, we fall behind in pulling from the remote repo or we make local changes that we don’t want to keep. To overwrite the local changes with what is on the remote repo, type:

git fetch --all

git reset --hard origin/main

When things go wrong

When things go wrong, check out Oh Shit, Git!?!.

7.5 Using Git via GUIs

Though Git was developed as a command-line app, there are numerous graphical user interfaces (GUIs) that you can use to run Git commands. For Windows and MacOS, you can use GitHub Desktop, which obviously integrates well with GitHub (Linux users must install a fork). Also, some folks like Git Tower or GitKracken. In addition, RStudio has some core Git features baked in if you use RStudio Projects. It is fairly straightforward to stage, commit, push, pull, and view your history from RStudio. While you should be able to get by with GUIs 95% of the time, there will be times when you need to use the commands in a terminal, so it make sense to be familiar with using the commands.

7.6 Git sandbox

The lab has a Git sandbox on its GitHub account. Feel free to go play around with Git there.