Some Popular VCSs

Fundamentals

A Version Control System is a special piece of software, that helps a group of developers working on a project to keep track of their work. Primarily, a VCS

  • allows developers to work simultaneously.
  • restricts unintentional overwrites.
  • maintains version history.

Two types of VCSs are currently in use industry-wide.

  • Centralized VCS
  • Distributed VCS

This article is doing a good job at explaining the similarities, differences, pros and cons of these two. To put the together an idea of the difference between the two for the sake of completeness,

  • Centralized VCS - There is a single centralized copy of the project. Everyone who is working on it, will have a working copy of the project in their systems. Once a change is ready to be published, it will go into this central working copy, from which the others will get the updates into their working copy. Typical examples of CVCSs include Subversion and Perforce.

  • Distributed VCS - In theory, there is no single central copy. Everyone involved have their own copy of the repository, to which the changes done in the working copy are published into. In order to publish these updates, all of the local repositories are synchronized with a shared remote location (this is where the bare repos come in to play). Without such a central location, one who has added some changes to their local repository will have to broadcast infomation about the said update so as to the others to see. Git and Mercurial could be enumerated as popular exampels of DVCSs.

Advantages of Git, over other available VCSs, could be summarized as follows.

  • FOSS
  • Fast & small
  • Implicit backup
  • Security
  • Powerful hardware is not required
  • Simpler branching

DVCS Terminology

In order to understand how a DVCS works, we need to clarify and be familiar with the following terminology.

  • Local Repo: A copy of the remote repo. All individual changes are saved in the local repo. This is also known as the Working Tree/Working Directory.
  • Staging area: A temporary buffer that keeps track of the files presented for the commit. This allows committing each file separately if needed.
  • BLOB: Stands for Binary Large Object. A BLOB in Git represents each version of a file. Git DB saves the SHA1 hash of this, making it immune to overwrites.
  • Tree: An object that represents the directory structure.
  • Commit: The current state of the repo. Every commit is named by SHA1 and points at the previous commit allowing traversing back in the history, as in a reversed linked list.
  • Branch: A branch is a list of linked commits. Git has a default named “master”. Every branch is referenced by HEAD that points at the latest commit of the branch.
  • Tag: Ususally a commit may appear as not much meaningful given it is identified by the SHA1 hash. Tags are names/identifiers that can be assigned to such commits which may be defined as human readable, thereby meaningful. These IDs are immutable and each tag is assigned to/associated with exactly one commit.
  • Clone: Creates a snapshot/an instance of the repo. This could also be understood as a single frame of a video clip.
  • Pull: Copies the changes in the remote repo to the local.
  • Push: Copies the changes in the local to the remote.
  • HEAD: A pointer which always points at the latest commit of the branch.
  • Revision: Represents a particular version of the source code.

Typical Life Cycle

When we are working on an existing repo, below mentioned is the typical life cycle that we would follow.

  • Clone the remote repo as a working copy
  • Modify the local copy
  • Review changes
  • Commit
  • Push changes to remote

Creation Operations

Bare Repository

By the nature, Git is a distributed VCS, which turn means that, there is no central singular copy of the Project, in theory. This leaves us with a problem on how to share the committed updates to all the other interested parties. This is where the concept of Bare Repository comes into play. A bare repository without a working tree. Conventionally, this is named as <reponame>.git.

A git repository could be initialized using git init command. To make the repository being initialized a bare repo, the flag --bare could be used.

$ mkdir projectname.git
$ git --bare init projectname.git/
Initialized empty Git repository in /<location>/projectname.git/
$ ls
branches        config       HEAD   index  logs     refs
COMMIT_EDITMSG  description  hooks  info   objects

Local Repository

This is the local copy of the project, that contains everything related to versioning, along with the version history. Transforming a working directory into a local repo could be done as follows.

$ git init #Initializes the current directory as a repository
Initialized empty Git repository in /<location>/<projectname>.git/
$ git status -s #Indicates the current status of the files contained in the local repository.
?? README.md

Here, the command git status displays the summary status of the current repository (unstaged files, changes which are not committed etc.). ?? indicates that the file is not being tracked.

Staging - Adding files to track

Adding files to the staging area is done with git add command. Following would add the selected files for tracking.

$ git add <filename1> <filename2> ... <filenameN>

Once the command is issued, the file(s) specified as arguments for this command will be marked with a green color capital letter A.

$ git add .

The above command adds all the files (the dot character acts as a wildcard that means all files) available in the repo for tracking. Usual practice is to use this method, where the command will add all files except the files/paths mentioned in the .gitignore file, in to the staging area. Following are some of the allowed expressions in .gitignore.

  • * - All files in the repository are ignored.
  • .idea - A file/directory with the name .idea is ignored.
  • !.gitignore - DO NOT ignore the file .gitignore.

Once the add command is executed, the Added files will be indicated with ‘A’ prefixed to indicate that the file is “Added” into staging area.

Committing

Files staged, could be committed with git commit. The commit message, followed by -m flag is compulsory. Make sure to put something meaningful as the commit message. In fact, the comment can be a multiline message where the first line will be interpreted as the title by most of the git hosts (GitHub, GitLab etc.)

$ git commit -m "Commit message"
[master (root-commit) df52a51] Commit message
4 files changed, 5 insertions(+)
create mode 100644 .gitignore
create mode 100644 README.md
create mode 100644 ig1.mml
create mode 100644 ig2.mml

Binding the Local Repo with Remote Repo(s)

The local repo could be bound to a remote repo using git remote command, as follows.

$ git remote add origin git@github.com:username/reponame.git

Here, origin is actually a name, that serves as an identifier assigned to uniquely identify this remote location.

Similarly, if we have a remote repo that is already bound, that could be changed as follows. Here, the remote with the name origin is given with a different URL.

$ git remote set-url origin git@gitgithub.com:username/reponame.git

Once this is done, the URLs could be verified as follows. Note that the URL mentioned here is what needs to be used in SSH access to the remote repository. Read about accessing remote with HTTPS and SSH, here.

$ git remote –v
origin    git@github.com:<username>/<reponame>.git (fetch)
origin    git@github.com:<username>/<reponame>.git (push)

Pushing

This is the operation where the changes committed to the local repository will be pushed to the remote repository. This is achieved with the command git push

$ git push origin master
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (5/5), 374 bytes | 0 bytes/s, done.
Total 5 (delta 0), reused 0 (delta 0)
To git@github.com:<username>/<reponame>.git
* [new branch]      master -> master

If you check the remote repo after this operation, the files except the files mentioned in the .gitignore should be available there.

Clone Operation

A local instance of a remote repo could be created with cloning as follows, using the command git clone.

$ git clone git@github.com:<username>/<reponame>.git
Cloning into 'repo_test_1'...
remote: Counting objects: 10, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 10 (delta 1), reused 10 (delta 1), pack-reused 0
Receiving objects: 100% (10/10), done.
Resolving deltas: 100% (1/1), done.
Checking connectivity... done.

This would make an exact local copy of the remote repo.

Checkout Operations

Checking out primarily refers to switch between branches. This is achieved using git checkout

$ git checkout branch_name
Switched to branch 'branch_name'

Branches may be already existing or it may be to a new branch. A new branch can be checked out with -b for git checkout.

$ git checkout -b branch_name
Switched to a new branch 'branch_name'

Additionally, this operation could also be used in replacing/resetting files and the changes done as indicated under reset operations.

$ git checkout file`.txt
Updated 1 path from the index

The above operation will drop all the changes in the working tree for the file file1.txt, and revert it to the latest version available in the local repository.

Performing Changes

After cloning, you can change the working tree. Once you are done with the required changes, you can commit them to the local repository.

When should I commit? Every time you have done some substantial and meaningful changes, you should commit them to the repository.

If new files are created, then those needs to be added using the git add command.

Revision History

Git provides two options for reviewing the changes.

  • git log
  • git show

The command git log indicates the revision history for the repository as follows.

$ git log
commit c1f363ad701b9ff98734ee0b01189e3f1edb648e
Author: <username> <user@email.com>
Date:   Mon Sep 26 15:56:10 2016 +0530
			
more changes

commit e94af7e5672302dfd33dec828ad8bbd3c5557dc7
Author: <username> <user@email.com>
Date:   Mon Sep 26 15:20:09 2016 +0530

some more of the files

commit 0bfdd2f3620ca9242f80b37cf485e56251ea7e9f
Author: <username> <user@email.com>
Date:   Mon Sep 26 15:13:59 2016 +0530

further new files

git show command indicates the revisions and their changes. Output is varying on the input flags. Optionally, the commit’s SHA1 hash could be given as an argument to view the commit’s details.

$ git show
commit c1f363ad701b9ff98734ee0b01189e3f1edb648e
Author: <username> user@email.com
Date:   Mon Sep 26 15:56:10 2016 +0530

commit message

diff --git a/src/.dmm b/src/.dmm
new file mode 100644
index 0000000..d6f09c3
--- /dev/null
+++ b/src/.dmm
@@ -0,0 +1 @@
+some more garbage to come

Further, git diff can be used to compare between commits and revisions. Following is the usage and an example output.

$ git diff HEAD~1 HEAD
diff --git a/Test.c b/Test.c
deleted file mode 100644
index 3438120..0000000
--- a/Test.c
+++ /dev/null
@@ -1,7 +0,0 @@
-/*created by 0004072
- *on date
- */
-#include <stdio.h>
-void main(){
-       prinf("Good bye cruel worldl\n");
-}
diff --git a/src/test2.tst b/src/test2.tst
new file mode 100644
index 0000000..359bb01
--- /dev/null
+++ b/src/test2.tst
@@ -0,0 +1 @@
+some meaningless text appended to a file for testing purposes

The result contains the comparison between the two commits. There are different options available to compare specific files selected from the two commits.

Update Operation

Update operations are done by the following sequence of steps.

  • Checkout
  • Modify/add/remove files
  • Add the changes to the staging area
  • Commit
  • Push to the relevant branch

Here, pushing may not be allowed if the repo is not in full synchronization with the remote repo. So, based on the situation, it might be required to pull the latest changes, which could be done using git pull command.

$ git pull origin master
remote: Counting objects: 7, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 7 (delta 2), reused 7 (delta 2), pack-reused 0
Unpacking objects: 100% (7/7), done.
From github.com:<username>/<reponame>
528d725..7560a95  master     -> origin/master
Updating 528d725..7560a95
Fast-forward
hmacsha1.sh   | 6 ++++++
src/test3.tst | 1 +
2 files changed, 7 insertions(+)
create mode 100644 hmacsha1.sh
create mode 100644 src/test3.tst

Stash Operation

Stashing is saving uncommitted changes added to the local repository. This allows managing such changes which would require later attention. In fact, stash works as a linked list that is kept track of in $PROJECT_ROOT/.git/refs/stash. Everytime the command git stash is executed a **kind of a temporary commit is created in the said location, which can be referred to using stash@{n} where n being the zero-based index of the stash entry/commit. This tutorial from Atlassian on Git explains how stashing functions under the hood.

$ git stash
Saved working directory and index state WIP on master: 7560a95 Merge branch 'master' of github.com:<username>/<reponame>

After stashing, those stashes could later be listed out as follows using git stash list command.

$ git stash list
stash@{0}: WIP on master: 7560a95 Merge branch 'master' of github.com:<username>/<reponame>

These stashes could later be launched for continuation of work using git stash pop command. This will apply the most recent stash reference (referred by stash@{0}) in to the working tree and the stash entry will be marked as garbage for later removal. Popping stash entries is analoguous to popping a stack.

$ git stash pop
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
			
new file:   comp.txt
new file:   src/test3.txt
			
Dropped refs/stash@{0} (15a9d4f9088b151263f8f235330e67d6c9e771e7)

There is a git stash apply as well, which without a stash reference will apply the most recent stash reference. git stash apply stash@{n} where n is a non-negative integer will apply the n+1th stash entry.

Move & Rename Operations

This is done using git mv command, as given below. Needs to be followed by git add to add them in the stage area.

$ git mv path/of/location/file.name path/of/dest/file.name
$ git mv path/of/location/old.name path/of/location/new.name

The above are nothing more than git added to the typical linux move command.

Delete Operation

Delete is also achieved by a similar strategy as above, using git rm command.

$ git rm file.name

Reset Operations

Resetting the incorrect/accidental changes is possible in git before committing the changes. Practically, these files could reside in two different locations from git’s standpoint,

  • in the local repo – Unintended deletions or modifications in the local repo could be restored by pulling out the required file from the local repo.
  • in the staging area – Files available in the staging area could be removed before the commit by checking them out from the HEAD. This involves an additional parameter added in the git checkout.
$ git checkout HEAD file.name

Apart from that, there is an explicit git reset command available, that deals with resetting using pointers for commits. This works in 3 different levels.

  • Soft: moves the HEAD pointer of the current branch to a specified commit without erasing the commits afterwards.
$ git reset --soft HEAD~
  • Mixed: removes the changes from the staging area. Local modifications are not affected and no effect on the commit either. Git reset defaults to this.
$ git reset –mixed <SOME COMMIT>
  • Hard: Pointer is moved to the specified commit, staging area is cleared and the local changes are also reverted.
$ git reset –hard <SOME COMMIT>

Tagging

Idea of tagging is assigning a human-readable and meaningful name to a particular commit. This is particularly useful in software versioning. Git supports two flavors of tags namely,

  • Lightweight Tag - Acts as a branch that won’t change. Functions as a readable pointer to a specific commit.

  • Annotated Tag - These are stored as full objects in the Git database, with their checksum and other information such as tagger name, email, date and a tagging message. These could be signed and verified with GNU Privacy Guard (GPG).

In general, annotated tags are the preferred mode of tagging, as it enables storing some information along with the tag. If for some reason it is required to omit all these additional information for some temporary requirements, a lightweight tag also can be used.

Creation of a tag could be done with git tag command. The commit that should be tagged with, can be mentioned either with its commit hash, or else using the pointer HEAD.

$ git tag -a "tagname" -m "Message on tag" HEAD

Once the tagging is done, it should be pushed to the remote repo as follows. git push --tags will push all the unpushed tags into the remote.

$ git push origin tag tagname
Counting objects: 1, done.
Writing objects: 100% (1/1), 166 bytes | 0 bytes/s, done.
Total 1 (delta 0), reused 0 (delta 0)
To git@github.com:0004072/repo_test_1
* [new tag]         tagname -> tagname

Tags could be listed down using the -l flag.

$ git tag -l

Tags could be deleted from the local repo as follows, with -d flag and tag name.

$ git tag -d 'tagname'
Deleted tag 'tagname' (was ff39d14)

Once the tag is removed from the local repo, to make them effective in the remote repo, it requires a push operation to be executed.

$ git push origin :V1.0
To https://github.com/user/repo.git
- [deleted]         V1.0

Also, the following can be used to delete tags in remote. Note that the local tag should still be removed.

$ git push --delete origin tagname
To https://github.com/user/repo.git
 - [deleted]         tagname