What you always wanted to know about Git defensive driving… but were too afraid to ask

What you always wanted to know about Git defensive driving… but were too afraid to ask

In the fourth in our series of “What you always wanted to know…” blogs, join Clearvision consultant Philip Armour on his journey down the road of Git defensive driving.

We all know that hardware can sometimes go up in a puff of smoke, users can have finger-trouble and make mistakes, and gremlins can choose an inconvenient time to chew on the cables in the server room.

With that reality in mind, it is generally accepted that every computer system, large or small, needs backups to mitigate such risks. This remains the bottom-line when protecting data. Incidentally, I think it is wise to take the view that if you do not have both a backup and a proven method to restore from the backup, then you shouldn’t really claim to have a backup at all.

In the context of SCM (Source Code Management) tools, the topic of ‘accident-recovery’ arguably has extra significance, given the value of the software assets they manage and the high costs of system downtime.

However, while they’re essential for managing complex software, it is also true that SCM tools themselves introduce additional risks. Are all users and admins sufficiently experienced and trained in using and configuring the tool? Do the users have the minimum permissions they need to do their job and no more? These types of questions are seemingly endless.

The SCM tools I have worked with in the past were based on big central databases with dedicated admin people and a set of seemingly very sturdy processes to guard against data loss or corruption. After all, a great number of the company’s software eggs were in this one basket.

This system worked reliably, and indeed well, for years on end, although very occasionally there would be downtime. There was one major blip when a database corruption caused the loss of a small amount of data, which led to quite a bit of stress for the admins, but in general it was a dependable system.

Safely distributed

We would expect Git, as a DVCS (Distributed Version Control System), to have even greater built-in safety than the single well-maintained centralised database. This is thanks to the distributed nature of Git, meaning every user, when they clone or pull, gets a copy of the whole repository (including all history and all branches). Therefore when there are many users the chance of any data being truly lost dwindles to almost nothing.

Git does not force you to centralise, but nevertheless I am sure that many organizations using Git regard centralisation as a good model in practical terms, and consequently have commits pushed to a central Git server like Gerrit, Atlassian Stash or Gitlab. In some sense this provides the best of both worlds – the simplicity and usability of a central server model together with the high-degree of built-in backup associated with a DVCS.

Luckily most users of Git are not responsible for keeping the official or blessed repository versions safe and secure. Nevertheless, there are definitely ways that a user can reduce the chances of losing work while working with Git, in other words: by practising the Git equivalent of ‘defensive driving’.

A local tool for local people

Most of what we regularly do in Git is local to where our repository is. If the repository is on a hard drive in your laptop, then you don’t require a network connection to work. Our Git activities stay local until we make a conscious decision to sync-up with the outside world (or the remote server).

Compare this to the non-distributed SCM world, where although we can change files (of the working copy) when disconnected, that is about all we can do. We wouldn’t be able to dream of looking at commit messages, traversing history, doing merges and so on.

Because we can do so much local work in Git before connecting to a network, it also means we can lose an awful lot if our equipment fails, so wherever our repository is, we need to take regular backups of it. One of the things I like about Git is that you only need to copy your repository directory somewhere else to have a full backup.

We can also use pushes to the server as a way to back up work, even if the work is unfinished and we do not yet wish to get it merged into an official upstream branch, by pushing to our own personal branch on the server (subject to having permission). This fits in well with the standard workflows of tools like Atlassian Stash, in which you can continuously commit to a personal branch until you are ready to submit a pull request for merging into an official branch.

It is worth mentioning that this approach is not as foolproof as backing up the whole repository in the simple file-copying way, because some aspects of our git setup (examples: git client hook scripts, local git stashes) are not uploaded to the server when we do pushes. This method combines most effectively with a daily file-based backup.

Keep calm and commit (and branch) regularly

In between our backups and pushes to the server, we can also adopt ways of operating which make losing our work very unlikely.

One way is to create commits often. These commits don’t need to represent completed activities or have textbook commit-messages; since we’re working locally, no one else needs to see them, and we have the chance to tidy them up prior to sharing them with others. The point is that work which has been committed is generally safe and easy to get back to. If a commit is part of a branch or has been tagged, then we will always be able to go back to it if we need to. Even if a commit is dangling (not pointed to by any branches, tags or other commits) there should still be some days or even weeks before the garbage collection tidies it away.

As well as creating commits regularly, we also make use of local branches as the second aspect of our Git defensive driving technique. The use of local branches killer feature in Git, one which no user should overlook.

In general terms, local branches let us context-switch effortlessly between different development activities on our repository, but they can also give us an equivalent of an ‘undo’ when performing potentially risky git commands, as well as protecting our commits from the garbage-man.

Sam Livingston-Gray describes this concept really well on his site think-like-a-git. He has named one such method the ‘Scout Pattern’.

The assumption here is that we are about to perform a ‘tricky operation’ in Git (perhaps a complex merge), and so we’re apprehensive about things going pear-shaped. We are probably already doing our work on a local branch (one which does not correspond to an upstream branch on the remote). Let’s call that local branch : my_work.

Firstly we make all sure our current work is safe by creating a commit. Then we create and switch to a new local branch:

git checkout -b could_go_wrong

We then do our risky rebase or merge etc. If the command runs smoothly and we are happy with the result we can merge the could_go_wrong branch into the parent branch my_work. If we are not so happy with the outcome of our risky work we can simply switch back to the my_workand we are back to where we were before.

To keep things tidy we finally delete our could_go_wrong branch (with git branch -D could go wrong).

Immutable but changeable

When we do operations such as git reset or rebase, this often results in our local Git history being rewritten. If our starting Git commit history looks like this:

git_log_example1

If we look at our Git history again after executing:

git reset --hard HEAD^^
git_log_example2

It now looks like we have lost commits 1bb3889 and c9717e5 because our reset has moved HEAD to the commit before c9717e5, which makes the two missing commits unreachable from HEAD.

Imagine we want to undo our git reset.

Here is where a command called git reflog can be really useful:

git_reflog

The reflog is Git’s record of the commits which HEAD pointed-to, and is updated each time HEAD changes.

So we can use git reflog to look-up what the commit IDs were prior to our git reset, and in the above case the command:

git reset --hard HEAD@{1}

will have the effect of undoing out git reset. Note that any changes to tracked files which have not been committed are still likely to be lost – hence the importance of regularly making local commits.

Force push with the safety on

Perhaps one of the scariest things to ever do in Git is to force push to a remote branch which others are collaborating on. Force pushes can and sometimes have caused unintentional deletion of large amounts of work, and so because of the inherent risks this operation should be performed with great care.

However there are situations when a force push is the right thing to do, and a nice tip for reducing the risks a little bit is to use --force-with-lease instead of --force. In other words:

git push --force-with-lease origin develop

instead of:

git push --force origin develop

A typical risk associated with the force push to an upstream branch is that you unintentionally wipe-out recently pushed commits which are not in your local copy of the repository. In a previous Git Blog, it was mentioned that a remote tracking branch (actually a reference maintained in our local repository) records the state of the remote upstream branch. The  --force-with-lease option is safer because it checks this remote tracking branch reference to make sure it is up-to-date – something that would not be the case if someone else had recently pushed commits. This stops us committing the sin of unintentionally overwriting work.

If we have to update the upstream branch in this way, it is also essential to tell other people who use the branch what we are doing, so that they can re-synchronize their repositories with the updated upstream commits.

A final thought

So that concludes our brief exploration of ‘defensive driving’ in Git. Following these tips should make both our valuable (hard-coded?) software and our co-workers safer when we are using the tool.

And perhaps more important than anything else…

… Don’t forget about making a backup!

Share on facebook
Share
Share on google
Share
Share on twitter
Share
Share on linkedin
Share

Reader Interactions