Git
Git is a decentralized version control system and content management tool. It allows developers and teams to manage projects by maintaining all versions of files, past and present, allowing for reversion and comparison; facilitating exploration and experimentation with branching; and enabling simultaneous work by multiple authors without the need for a central file server. It can be used offline for version control and revision history or in conjunction with a remote repository to make working in teams easier and safer.
It is important to note that Git itself is not a tool for backing up files. The loss of a local Git repository in connection with a file system failure is permanent unless a remote copy of the repository exists.
There is a short training video that parallels some of the topics discussed in this page.
Resources
Git binary
While it's possible to use git
on most systems without configuration, we strongly recommend using the most recent version of Git, which can be accessed through the
module:
module load git
This should prevent any problems that may arise as a result of version incompatibilities.
Remote Git repositories
The Center for High Performance Computing maintains a GitLab Community Edition server for users who are interested in collaborating and sharing internally. You can log in with your University of Utah credentials (your ID and password).
UPDATE- October 2021: In order to access CHPC's gitlab instance, gitlab.chpc.utah.edu, you must be on campus IP space. If you are trying to access gitlab.chpc.utah.edu from off campus, you will need to be using the campus VPN.
Alternatively, third-party hosting services can be used; some of the most popular are GitHub, GitLab, and Bitbucket. Each has its strengths and weaknesses, so seek out reviews, policies, and recommendations before you start.
Quick reference
This is intended for users who have some experience with Git. If you haven't seen these commands before, consider reading through the brief tutorial first.
Command | Description |
git help operation | Read more about operation (e.g. git help push ) |
git init | Create a Git repository in the current directory (if it doesn't already exist) |
git clone URL destination | Copy the project at URL into the (new) directory destination |
git remote add remote_name URL | Add a remote named remote_name with location URL; the primary remote is typically named origin |
git config user.name "Firstname Lastname" | Set your name to Firstname Lastname (use --global to change this globally) |
git config user.email "firstname.lastname@utah.edu" | Set your email to firstname.lastname@utah.edu (use --global to change this globally) |
git status | Display the status of the current branch (shows which files are present in the staging area) |
git diff --cached | Display what will be committed; alternatively, use git diff to show any conflicts |
git log --stat --summary | Display an overview of the project history, including the summary (commit message) and changes |
git add filename other_file | Add filename and other_file to the staging area |
git rm --cached filename | Remove filename from the staging area |
git commit -m "message" | Create a new commit with description message; git commit -a automatically commits any modified (but not new) files |
git pull remote_name branch_name | Fetch commits on branch branch_name of the remote remote_name; when set up, you can use git pull |
git push remote_name branch_name | Push commits on branch branch_name to the remote remote_name; when set up, you can use git push |
git checkout -b branch_name | Create (and switch to) new branch branch_name |
git checkout branch_name | Switch to (existing) branch branch_name |
git branch | Display the branches available; marks the current branch |
git branch -d branch_name | Delete the branch branch_name |
git merge branch_name | Merge commits in branch branch_name into the current branch (if there are no conflicts) |
Sample usage
This is a small sample of how you might set up a Git repository on GitLab to share your work with others. For an in-depth explanation of the steps, refer to the tutorial section.
- Create or locate a remote repository on GitLab (or another service). The URL of this project will be of the form https://gitlab.chpc.utah.edu/gitlab-user/project-name.
- Create a local repository in a directory on your computer.
- Without an existing (remote) repository:
$ module load git $ cd your_directory $ git init $ git remote add origin https://gitlab.chpc.utah.edu/gitlab-user/project-name
Crucially, gitlab-user is not necessarily your university ID. To determine what should be used here, sign in to GitLab and locate your user ID. This can be changed in your settings and it may be a good idea to use your university ID. You can also refer to the "Create a project on GitLab" section to determine the URL you should use.
- From an existing repository:
$ module load git $ git clone https://gitlab.chpc.utah.edu/gitlab-user/project-name your_directory $ cd your_directory
Again, gitlab-user may be something other than your university ID. Refer to the URL of the project on GitLab to determine what to use.
- Without an existing (remote) repository:
- Stage and commit your files. Refer to the "Edit and stage your files" section for more information about adding files to the index and the "Commit your changes" section for information about commits. You can exclude certain files with the .gitignore file.
$ git add . $ git commit -m "This is a description of the commit!"
- Push your changes to the remote.
$ git push origin master
git pull
command first. See the "Conflicts" section for an explanation.
Brief tutorial
This is not meant to be a comprehensive guide to Git; in fact, it makes many generalizations and has no mention of many important features. It is meant only to introduce some of the concepts of version control and cover the commands necessary to get started. If you are looking for a more comprehensive tutorial or specific information, please try the official tutorial.
This tutorial assumes you're using, or plan to use, a remote repository on the Center for High Performance Computing instance of GitLab. The process should be very similar for other hosting providers.
Create a project on GitLab
If you plan to share your work with others, you'll likely need a remote repository to ensure availability. If you're using GitLab, this can be done by creating a new "project." The project contains the remote repository and adds additional features, like a description, wiki, and editing tools that can be used in an Internet browser. Each project has a "visibility level" for security. "Private" (default) requires you explicitly grant access to each user who will be working on (or simply viewing or cloning) the project, "Internal" allows all authenticated users to view or clone the project (but editing privileges must still be granted explicitly), and "Public" allows anyone to view or clone the project. It's also possible to create projects for groups of users, which is recommended if you have many projects with similar permissions.
You can use HTTPS or SSH when transferring files to and from your computer. When using
HTTPS, you must sign in with your university ID and password (as you would on the
GitLab website), while with SSH, you generate a pair of keys and create a single password.
This decision is largely based on personal preference. The remainder of this tutorial
will use HTTPS for consistency. In most cases, you won't want to use the URL given
by the project page when using HTTPS. Instead, use the URL of the project page itself
(you can copy it directly from your browser). For instance, instead of https://gitlab-user@gitlab.chpc.utah.edu/gitlab-user/project-name.git
, use https://gitlab.chpc.utah.edu/gitlab-user/project-name
. This will prompt you for both your username and password when pushing changes to
the remote instead of assuming your username (in this case) is "gitlab-user," which
is often different than your university ID, which must be used for authentication.
Create a local repository
Without an existing repository (new project)
To start using Git, you'll need to initialize it in the directory of your project.
$ module load git $ cd your_project_directory $ git init $ git remote add origin https://gitlab.chpc.utah.edu/gitlab-user/project-name your_project_directory
From an existing project
You can copy an existing repository to your own computer with the git clone
command.
$ module load git $ git clone https://gitlab.chpc.utah.edu/gitlab-user/project-name your_project_directory $ cd your_project_directory
Getting started
Verify that the local repository exists
To verify everything's worked up to this point, run git status
in your project directory.
$ git status On branch master Initial commit Untracked files: (use "git add <file>..." to include in what will be committed) your_files/ nothing added to commit but untracked files present (use "git add" to track)
If this didn't work, you'll receive an error. If this happens, check your version of Git and the directory you're in and try again. The remainder of this tutorial assumes everything is working as intended, so it's best to resolve any issues now.
$ git status fatal: Not a git repository (or any parent up to mount point /your/home/directory) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Configure your name and email
This step is important, but often neglected. Your commits will be associated with the email address you provide here (including on most third-party hosting services) and your name will help colleagues identify you.
$ git config user.name "Firstname Lastname" $ git config user.name "firstname.lastname@utah.edu"
If you plan to use the same name and email address for all of your projects, you can configure them globally.
$ git config --global user.name "Firstname Lastname" $ git config --global user.name "firstname.lastname@utah.edu"
This saves your information and uses it for all projects (unless explicitly changed on a given project).
Edit and stage your files
You can use any editor to modify your files and run git status
periodically to view their status. You'll notice that any "untracked" files are listed
under a heading that says "use 'git add <file>...' to include in what will be committed";
a command like git add your_file_name.ext
will add the file to the staging area, officially called the index, which contains all of the changes that will be made with your commit. If you're
satisfied with all of your changes, it's possible to add them to the staging area
with git add *
. You can exclude files selectively with the .gitignore file. Once you've added files to the staging area, they will be visible when running git status
. You can remove files from the staging area with git rm --cached your_file_name.ext
.
Commit your changes
Once you're satisfied with the state of your project (specifically, the state of the
staging area), you should "commit" your changes. This is analogous to saving and backing
up your work (it's important to remember that Git alone should not be used to back up files: you could still lose them). You can compare commits and,
if necessary, revert a file to a commit. Each commit contains a brief message about
the changes that were made, and the easiest way to do this is with the git commit -m "This is your message"
command. You can create as many commits as you'd like before pushing your work to
a remote repository.
Push your changes to the remote
Conflicts
Git has no system to prevent collaborators (or even individuals working with multiple
branches) from having two entirely different versions of the same file. Comparing
and merging documents has always been tedious, but many methods of facilitating collaboration
have developed in recent years to make it easy or even unnecessary. Some software,
like the content management system used to edit the website you're reading now, requires
users "check out" a document before editing (much like a library, once it's been checked
out, nobody else can use it). Others, like Google Docs, allow people to work simultaneously
and display changes in real-time but require a consistent Internet connection and
only allow for one version of a file (there's little room for independent testing).
Git's solution is somewhere in the middle: it can be used offline and independently,
but it allows users to discuss conflicts and makes finding them much easier. In fact,
Git will prevent you from finalizing your changes until you have (potentially) resolved all conflicts with other versions. In other
words, you must git pull
the most recent version from the remote before you can git push
your own. If there are potential conflicts, they're identified (use git diff
to see them) at this point. You should try your best to manually fix any conflicts:
after you've pulled the more recent version of the file, you are now able to push
your own, regardless of whether you've corrected any problems. This system allows
all developers to work simultaneously without worrying about what others are doing,
but it only works if everyone knows how to use it. It's still possible to overwrite
someone else's work, but this allows for a much more dynamic development process than
other methods and stays out of the way when not needed. For instance, two people can
edit the same paper simultaneously. Each time there is a difference in the text, the
better option can be chosen, or a new one written, to create an entirely new document
with work from both contributors. No time or effort is wasted in comparing text that
is the same in both versions.
General process
When you're ready to push your changes, it's generally a good idea to git pull
. Often, this won't cause any problems and you can proceed with your git push
. However, if there are conflicts, you will receive a warning:
$ git pull origin master Username for 'https://gitlab.chpc.utah.edu': your_id Password for 'https://your_id@gitlab.chpc.utah.edu': From https://gitlab.chpc.utah.edu/gitlab-user/project-name * branch master -> FETCH_HEAD Auto-merging your_file_name.ext CONFLICT (content): Merge conflict in your_file_name.ext Automatic merge failed; fix conflicts and then commit the result.
The file with the conflict will be modified to contain both versions:
<<<<<<< HEAD This is an example of what it might look like. This is from the first version. ======= This is from the second version! >>>>>>> 57a4c537d0cc429794dfed77d02e5a1bfca9d91b
The differences can be identified with the git diff
command and should be resolved manually. When you're satisfied with the files, add
them to the staging area and create a new commit. Now, you can proceed with git push
:
$ git push origin master
If everything worked, your changes should now be available on the remote. Check on GitLab to see if everything worked as expected.
Branching
Create and use branches
Branches allow developers to work on multiple versions of a project simultaneously. They can be used, for example, to test features that may or may not be included in a project. If it's decided they are to be included in the main version of the project, the branches can be merged simply and issues should be identified (as with potential issues between local and remote files). If the new version of the project isn't needed, the branch can be abandoned or deleted entirely without repercussions.
A new branch can be created with git checkout -b new_branch_name
. The branch will contain the same files and commits as its origin when it is created.
You can view available branches and identify the branch you're currently on with the
git branch
command.
git branch
* new_branch_name
master
Now, if you modify files, they'll be modified on the new branch. If you want to switch
to a different branch, you can use the git checkout
command again, like git checkout master
. Be sure to commit your changes on one branch before switching to another.
Merging branches
To merge one branch into another, use the git merge
command. Start on the branch you'd like to merge changes into and run git merge other_branch
. Everything said about conflicts between local and remote versions of a file holds for branching, too. If there have
been commits in both branches, conflicts will need to be resolved manually.
Other considerations
Special files
.gitignore
The .gitignore file (a child of the project directory) is used to exclude certain files from most Git operations. The files listed in this document will not be tracked by Git (without explicit instruction). It might be used by a developer who wants to share source code but not binaries or a scientist working with sensitive information publishing his or her tools while ensuring the data itself is not available to the public.
Your .gitignore file uses patterns to exclude files. As a result, if the files you are adding are similar, you can simplify the process. For instance,
experiment.out testing.out case1.out case2.out
might become (assuming all files ending in ".out" are to be excluded)
*.out
You can read more about patterns on the Git documentation.
README.md
The README file (a child of the project directory) describes a project and provides important information to potential users and contributors. It's typically displayed on the main page of a project on services like GitHub and GitLab. Most are written with Markdown syntax and named README.md. This is where people tend to look when searching for information about your project.
Recommendations
While Git can manage binary files, it works best with plain text. For instance, if you were writing a paper, it would be a good idea to use plain text (such as LaTeX) in place of a document created with an editor like Microsoft Word. Documents saved in plain text can be compared far more easily (often side-by-side) and can usually be viewed in a browser without downloading the file.
Try to git pull
the most recent version of a project before you start editing it. This way, you won't
have to resolve as many conflicts when it comes time to push your changes to the remote.