Update on 19/09/2015:
Git submodules are often considered as de-facto standard to incorporate one Git module to another, yet there is an
option which slightly less known - git subtree. For a long time
git subtree was a part of
git-contrib and we
had to install it manually. In recent versions
git subtree became a part of standard git distribution, which makes
it officially an alternative to
The text which follows is the original article I posted in 2014, providing one of the many examples of where
git subtree may be used, and showing how to use it.
One of the Git-based solutions to modularize a large code base, or to reuse existing repositories in multiple places -
is using a
subtree is basically a directory in one branch of the repository, which may be managed
in the other branch of same or another repository, and both can be well synchronized.
I have found
git subtree very useful in the transition period when splitting large modular code base into a multitude
of the repositories.
For example, let’s say there is a project consisting of several modules: every module is managed in a sub-directory of a single source code repository. Now, every module can be moved to its own “clean” repository (or even a branch of the same repository; although…, don’t do that) while those sub-directories in existing branches will reference newly created repositories (branches). New changes continue in the new repositories, and are merged back to sub-directories of previously existing “super” branches.
This clearly sounds very similar to
git submodule, but it is completely different inside, and its implementation
differences render several advantages:
Full transparency - any tools or workflows that currently depend on the project structure will continue to work well - they’d be using same old branches, and won’t require anything additional, not even a single additional Git command. So, your CI server won’t notice anything, for example.
Similarly, if necessary, development can still continue on previously existing development “super” branches, and can be periodically synchronized back to module repositories or branches.
So, neither of workflows are changed, but every module is already in its own repository. This may be nicely used in the transition period - when not all of the tools or processes are ready for the change.
Hands on git subtree
git subtree - installation
Update 15/09/2015: depending on how you installed Git (Homebrew on Mac, apt-get on Ubuntu, or else) and very likely
if you are using Git 1.8 or later,
git subtree is normally already bundled in, and you may skip
this section. Check if
git subtree exists, and proceed with installation instructions if it doesn’t.
git subtree command is not available, it can be installed from its Git contrib project. Installation is
explained in its GitHub repo, or one may try on of
the alternative ways,
especially on Windows where
make is not available in Git Bash.
And yes, this is only required on those systems, where a subtree will be manipulated, but it isn’t required where the repository will be used for regular fetch-merge-commit-push workflow.
Moving directories to external repositories or branches
Let’s say, the project under question consists of several modules (“module-1”..”module-n”) , and one would like to split some of those (or maybe all of them).
$ git checkout source-branch $ git subtree split --rejoin --prefix=module-n --branch module-n
Notice now, that new branch
module-n, contains now entire history of all changes made to the directory
of a source branch, and only changes made to its files. Newly created
module-n branch contains only
content, all in its root now, not in
module-n subdirectory. It is also an orphan Git branch - that is it starts its
own history from zero, just like
master branch for example. Finally,
--rejoin also merged this branch to the source
branch, to make it explicit that this is incorporated into the source branch from this new module branch.
This is the initial setup. Like this, it is pretty nice - basically the module is fully extracted into its own branch, with its own history. New module branch can be pushed to a new repository for example, but the advantage of git subtree that this is not even necessary - you can still keep everything in one repository.
Synchronizing subtree and its branch
Now, both, module branch and branch including it, can be well synchronized, whether the changes are made on either of them, or even on both of them.
If one wants to focus on one module only, checkout module branch, make feature branches, commit one or multiple changes, do the usual stuff. Unless explicitly requested, all your work rests in this branch only. But then later, this can be all delivered back to the “super” branch, with:
$ git checkout target-merge-branch $ git subtree merge [--squash] --prefix=module-n module-n
--squash will deliver everything in one commit, while otherwise every commit to
module-n branch will be
replayed in the target merge branch.
The good thing is that in the target branch, module source can be updated to any point of the module branch, not
necessarily the latest one. And, when using
--squash, not even sequentially (that’s right, you can basically rollback
to older state if needed). Just specify the commit or tag to merge, just like you would do with usual
Next, it isn’t required that this is only merged with the original source branch. Let’s say for example, that you as well want to deliver some older, more stable version of module to the release branch:
$ git checkout release_1.0 $ git subtree add --prefix=module-n modn.v.0.9
Now, release branch contains an older version of one module (found by tag
modn.v.0.9), but likewise it can contain
other versions of other modules (
mod2.v.1.2, etc.). Same wouldn’t be possible if all modules are managed in one
development branch only, would it?
But what about changes in the original source branch? What if one still wants to work on old source branch, not
switching to module specific branches? Yes, this is also possible. The point about
git subtree split, is that
subsequent splits are consistent. Even if one did multiple changes in the original source branch after the split, and
even if they were mixed with other changes in other modules, and even if they were mixed in same commits, all this is
still OK. Repeating same
git subtree split command over the time will just update the module branch with changes made
to the module as a subdirectory in the original branch:
$ git subtree merge [--squash] --prefix=module-n module-n
This will as well work fine if some changes were made in both branches - regular git merge happens in this case, and regular conflict resolution, if any exist.
So, when compared to
git submodule, I would say that
git subtree differs in following:
subtreeis more complex to set-up correctly: unlike
submoduleit’s setup is just a little more complicated, and it may require quite complex merges if changes are expected to be made on both sides;
on the other hand,
subtreeis much more easy to use by those who don’t set it up or merge - it is absolutely transparent to the users after it was configured and doesn’t require any additional knowledge or any additional Git commands to issue; unless, again, you do changes in both places in parallel;
subtreeis quite flexible in how it allows making changes in both places
A verdict? I have found a
subtree very useful in the scenario presented above - in the transition period when
splitting large modular code base into a multitude of the repositories. For other use cases, I think
better suited to reuse a repository in the another one - even though it requires a couple of additional commands, it is
more straightforward to use - exactly because it is explicit in what it does.