Git Tutorial – Gaurav Sharma's Blog

1. Basic Git Concepts

In this chapter, the key components of git architecture and the important concepts are explained. They can be subdivided into the following:

1.1. Repository

A git repository is a database containing all the information needed to retain and manage the revisions and history of a project.

Unlike file data and other repository metadata, the configuration settings are not propagated during cloning and are inspected on a per-site, per-user, and per-repository basis.

Within a repository git maintains two data structures. These two are stored at the root of the working directoryin a hidden subdirectory named .git.

The object store This is copied during cloning.

The index It containts transitory information private to a repository.

The next section describes them in more detail. Git Object Types

Blobs
Definition: Each version of the file is represented as a blob.
Properties: They are the following:
1. Blob is opaque.
2. It does not hold any metadata about the file.
Trees
Definition: It represents one level of directory information.
Properties: They are the following:
1. It records blob identifiers, path names, and a bit of metadata about all the files in the directory.
2. It can recursively reference other subtree objects and thus build a complte hierarchy of files and directories.
Commits
Definition: Each commit points to a tree object that captures the state of the repository at the time of the commit.
Properties: They are the following:
1. It holds metadata for each change introduced into the repository eg author, committer, commit date, and a log message.
2. The root commit has no parent.
3. Most committs have one commit parent.
Tags
Definition: It assigns a presumably human readable name to a specific object, usually a commit.

Index Addition, deletion, or editing of files are staged in an index, where they are retained until they are committed. Changes can be made to the index before committing as well.

The index allows a gradual transition from one state to the other.

1.2. Content-Addressable Names

The Git object store is organized and implemented as a content-addressable storage system.

Each object in the object store has a unique name produced by applying SHA1 to the contents of the object, yielding an SHA1 hash value. The SHA1 name is called object ID or hash code as well.

1.3. A Content Tracking System

Git’s content tracking is manifested in two critical ways that differ from other VCSs:

Git’s object store is based on the hashed computation of the contents of its objects, not on the file or directory names from the user’s original file layout.
If two separate files have exactly the same content, git stores a single copy as a blob within the object store.
Git’s internal database efficiently stores every version of every file – not their differences – as files go from one revision to the next.

1.4. Pathname Versus Content

Git treats the name of the file as a piece of data that is distinct from the contents of the file.

Git merely records each pathname and makes sure it can accurately reproduce the files and directories from its content.

Git’s physical data layout isn’t modeled over the directory structure. It’s own internal layout can reproduce the user’s original layout. Git’s internal structure is more efficient for its own internal operations.

1.5. Pack Files

Git uses an efficient storage mechanism called a pack file. To create a packed file, git first locates files whose content is very similar and stores the complete content fro one of them. It then conputes the differences, or deltas, between similar files and stores just the differences.

Git can take two files from anywhere in the repositories and compute deltas within them.

2. File Management and the Index

In a typical VCS, editing is done in the working directory and changes are then committed in the repository. In git, there is an extra layer of an index between these two. Hence, editing is done in the working directory, changes are accumulated in the index, and the index is then committed to the repository.

Index is thus the set of intended or prospective modifications. Most of the critical work precedes the commit step.

Git index doesn’t contain any files. It tracks the set of changes that should be committed.

The state of the index can be queried by using the command \texttt{git status}.

2.1. File Classifications in Git

The files are classified into three groups: tracked, ignored, and untracked.

Tracked A tracked file is any file already in the repository or any file that is staged in the index.

To add a new file to this group, run \texttt{git add filename}.

Ignored An ignored file must be explicitly declared invisible or ignored in the repository even though it may be present within your working directory.

Untracked All the files in the working directory which are not in the two groups above are marked as untracked.

2.2. Using git add

The command \texttt{git add} stages a file. If the file is untracked its grouping is changed to tracked. A new object is created in the object store (if the file contents have changed) and the indexed is changed to reflect that the file name points to the new object in the object store. If used on a directory, all the files in the directory will be recursively added to the index.

To find the SHA1 values for the staged files, \texttt{git ls-files –stage} command can be used.

To commit, first \texttt{git add} is used to update the index. Then \texttt{git commit} is used. The \texttt{git status} after \texttt{git add} shows the state of the working directory and the index and the work that would be done if the command \texttt{git commit –all} were to be issued.

When the index is out of sync with the working directory, git add is used. When the repository is out of sync with the index, git commit is used.

Using git commit at a time when index is out of sync with the working directory and the repository, would not make the working directory clean.

Notes on using git commit

2.3. Using git rm

Git removes a file from:

the index or
the index and the working directory.

It cannot remove the file from the working directory alone.

Also, it cannot remove the file from the repository without deleting it from the index. Thus any deletion from the repository goes through a deletion from the index.

A file which has been accidently deleted from the index and the working directory can be recovered by using the \texttt{git checkout HEAD — filename} command.

2.4. Using git mv

The following two are equivalent:

3. Commits

Now that the index is already explained git commit should be easy to understand.

3.1. Identifying Commits

This is an essential task while using git. Git provides a number of mechanisms to make this easier.

Absolute Commit Names The most rigorous name for a commit is its own hash identifier. This can be shortened to a unique prefix within a repository’s object database.

Relative Commit Names Within a single generation, a caret \^{} is used to select a different parent.

The tilde \char`\~ is used to go back before an ancestral parent and select a preceding generation.

4. Remote Repositories

A clone is a copy of a repository.

The repository are connected to each other through remotes – which is a reference, or handle, to another repository (link can be through filesystem or network). After establishing remotes the data is transferred using a push or pull model.

The remote-tracking branches and publishing a repository are two more options available for repository sharing.

4.1. Bare and Development Repositories

A bare repository has no working directory and should not be used for normal development. There is no concept of a checked out branch. It is simply the contents of .git directory. A published repository should be bare.

4.2. Repository Clones

After a repository is cloned, the local, development branches of the original repository (stored in ref/heads/) become remote-tracking branches in the new clone (under refs/remotes). Remote tracking branches, hooks, configuration files, stash and reflog are not cloned.

Each clone maintains a link back to the original repository but not vice versa.

4.3. Remotes

Repositories

Local or Current.
Remote.

Tracking Branches

Remote Branch: A branch located in the remote repository.
Remote Tracking Branch: A branch in the cloned repository, associated with the remote, with the purpose of tracking changes in that remote repository.
Local Tracking Branch: A branch in the cloned repository, paired with a remote-tracking branch, with the purpose of integrating the local development with the changes in the remote tracking branches.
Local Nontracking Branch: A branch in the cloned repository, on which development is done locally.

During the clone operation, git automatically creates remote-tracking branches for each of the topic branches in the upstream repository.

The local topic branch dev is stored as refs/heads/dev. The remote tracking branches are stored in the refs/remotes/ namespace. The remote tracking branch origin/master is actually refs/remotes/origin/master.

Remote tracking branch should be treated as read only.

\subsubsection{refs and symrefs} A ref is an SHA1 hash ID that refers to an object within the git object store. A ref usually refers to a commit object.

Symref indirectly points to a git object. It is still just a ref.

Git maintains the following special symrefs automatically:

HEAD \hfill
HEAD always refers to the most recent commit on the current branch.

ORIG\_HEAD \hfill
ORIG\_HEAD always refers to the second most recent commit on the current branch.

FETCH\_HEAD \hfill
It is related to remote branches.

MERGE\_HEAD \hfill
It is related to remote branches.

The refspec maps branch names in the remote repository to branch names in the local repository. The names of development branches start with refs/heads prefix. The names of remote-tracking branches start with refs/remotes/ prefix. All the source branches from a remote repository in namespace refs/heads/ are mapped into your local repository using a name constructed from the remote name and placed under the refs/remotes/remote namespace.

The syntax of refspec is:

Refspecs are used by git fetch and git push.

Git Push To update the remote repository with your changes, you use git push to publish your changes as topic branches in the remote repository. The typical refspec is:

Here the asterisks ensure that every branch is pushed.

Without a refspec git push sends the commits from all the branches that are also present in the remote repository.

Thus new branches must be explicitly pushed by name.

Example Using Remote Repositories

4.4. Working with Tracking Branches

4.5. Remote Repository Development Cycle

Cloning a Repository A \texttt{git clone} command results in two separate repositories.

All the commits from the original are copied into your clone.
The branch named \texttt{master} from the original repository is introduced into your repository as a remote-tracking branch named \texttt{origin/master}.
The new remote-tracking branch is initialized to point to \texttt{master HEAD} commit of the remote repository.
A new local-tracking branch called \texttt{master} is created in your clone.
The new master branch points to remote repository’s \texttt{origin/HEAD} or the remote-tracking branch \texttt{origin/master}.

5. Repository Management

5.1. Publishing Repositories

Two scenarios An open source development environment in which many people across internet will develop. A project for internal development by a private group.

It is strongly recommended to publish only a bare repository.

Repositories with Controlled Access

6. List of Commonly Used Commands

Undoing changes made to a file: git checkout — filename git checkout — index.html

7. Sample Workflow

Here are some commonly used commands.