Brexgit! How to break up or unite Git repositories

Clearvision Technical Consultant Philip Armour tackles the topic of "Brexgit" - how to split a Git repository into smaller ones, or combine into one repository.

Repositories Blog

Divide and Conquer?

Unity and connectivity are usually positive things, but as we’ve seen recently, sometimes people want to split up and be more independent. The same can be true in software. If you are using Git it is well worth understanding the methods by which a software repository can be split up into several smaller ones, or indeed the opposite: what to do when you want to combine several smaller repositories into one.

In this post we explore both of these methods, but it is perhaps first worth comparing the positives and negatives of having a multiple repository set-up versus a monolithic one.

Good reasons for using multiple repositories include:

Wishing to split the software along functional or architectural lines into independent logical components
Wishing to split the software along organizational lines – according to ownership and how access is to be granted (different teams working on different repositories)
The multi-repository model may also fit well with the use of microservices to implement complex applications.

… Or Better Together?

I may not have been on the winning side in the Brexit vote, but when it comes to Git, I find it is much easier to convince people about the tangible positives of the monolithic approach (one big repo for all related software), assuming if there are no compelling reasons to avoid it for a specific case. The main benefit is that you sidestep the cross-repository dependency headache, which may push you towards adding complexity to your workflow through using one of the following ‘solutions’:

git submodules which have a bad, bad reputation for being complex (even among git evangelists it is hard to find someone who recommends this)
git subtrees are easier to use but effectively move you towards the mono-repository
repo (a tool originating from Google/Android development) is an interesting option but with uncertainty regarding how well it works with Git-hosting solutions other than Gerrit
There are other options, but these are arguably the main ones.

When Repos Collide

Let’s get back to methods of splitting and combining Git repos. Combining Git repositories is, on the surface, easy. In fact it can even be done without changing the commit histories (SHA1s) of either repository. It is also a great illustration of the flexibility of the Git DAG (directed acyclic graph): if you want to ‘import’ commits from one (completely different) repo into another repo, just add the ‘foreign’ repo (if you are not suspicious of foreign repos) as a ‘remote’ and fetch or pull the commits.

Imagine we wish to combine repository foo with a completely different repository bar while keeping full commit history of both. For simplicity, we will only consider a simple case, where each repository has a single master branch. In more realistic cases, the procedure must take other branches into account also.

Our starting point is depicted below:

Starting Point

When you have a local clone of a Git repository there is usually a remote (called origin by default) which is the address of the repository you cloned from. For a local clone of repository, foo in our example, let’s assume that origin has the address: git@bitbucket.org:example/foo.git

We now define a new remote of foo which points to the address of the second repository (bar). Let’s say this is git@bitbucket.org:example/bar.git

Now from foo, if we run the following git commands:

git remote add bar_repo git@bitbucket.org:example/bar.git
git checkout master
git fetch bar_repo master

The result is depicted in the following diagram. I think it is pretty cool that Git does not mind at all that there are two completely separate histories in its database.

Commits From Bar

The final step is to connect the two histories with a merge:

git merge bar_repo/master

Which results in:

After Merging

So we have successfully merged both commits without changing any commit IDs. We can now remove the bar_repo remote and the bar_repo/master branch.

The only downside of combining two repos with a merge as described above is that you may end-up with a mish-mash of files and directories. Therefore this is possibly an argument for always putting the content of Git repositories in a single subdirectory at the top-level. Where that’s not the case it’s possible to first use a method based on a tool like git-filter-branch to restructure the repos before combining them (accepting the fact that this will re-write all commit IDs).

“Brexgit”!

Finally, the ‘Brexgit’ option. Splitting up a Git repo is also done via git-filter-branch method, for which there are standard cookbook recipes. For example, one method is documented here by Atlassian.

This also results in rebuilding the commit histories and modifying all SHA1s.

With this method, if after the split you later decide that things weren’t really so bad with the big old repository you had before, you can always eat some humble pie and combine them together again.

Atlasssian expert resources

Visit our blog for expert news and articles from the Atlassian world. On our resources page you will find recorded webinars, white papers, podcasts, videos and more.

The Software Blog

Read our blog for articles offering best practice advice written by Atlassian experts, as well as the latest news concerning your software.

Software White Papers and Guides

Dive deep into Atlassian software with our white papers and guides on individual tools, partner products, services, and best practices, written by the experts.

Expert Webinars

All of our webinars are pre-recorded and available to watch on-demand. Enjoy everything from partner features to application demos and updates from Atlassian experts.