Nuno Martins Carlos Coutinho

Projects and tools are usually persisted by sets of files. These are used for source code, documentation, configuration, support, and several other purposes. The evolution of these files dictate the outcome of the project, hence the use and establishment of version management tools is mandatory to correctly perform a proper configuration management of the project, allowing a proper traceability of the evolution of its components.

A Git file repository is a version control system (as many others, e.g., SVN, CVS, Clearcase), considered suitable to manage these structures with numerous folders and files, as they keep track of all changes that were performed on these structures, allowing the analysis and even the reversion of those changes.

As a project evolves and gets more complex, its resources are usually organised in the shape of modules, each concerning a particular part of the project. However, it is frequent that in time, some of these modules or tools also mature and gain their own scope outside of the project, and especially with different stakeholders than the ones for the project. As a result of that, more and more projects are splitting their core modules or tools not in subdirectories of the same common root, but instead into separate repositories that can be managed separately and that can in the future evolve independently of the project.

 

Sub-Repositories

The drawback of having not one single repository but several is about management of these. Considering that the TIMBUS project has n different tools, corresponding to the same number of independent Git repositories containing their code, this methodology means that it is easy for a developer to clone and evolve the development of a particular TIMBUS tool, without having to retrieve as well the code for all other tools.

However, it is much more difficult for a Technical Manager of TIMBUS to maintain control or to get the code for all the tools that are being developed in the scope of the project, especially if this number of tools is highly dynamic, which is the case of TIMBUS. These code repositories, despite being inside the scope of the TIMBUS project, can be developed by different persons at different paces with different management workflows. This can be a serious challenge to management.

So, the solution to consolidate all these separate Git repositories is to create an "umbrella" repository, which holds the reference to every other Git repositories created in the scope of the TIMBUS project.

 

Pros:

  • Separate Git management workflows allow isolation for each tool development;
  • Module development focus is not lost (only module commits are shown);
  • More control on who can commit to where;
  • Umbrella project enables code compilation using all modules.

 

Cons:

  • Can lead to duplication of code;
  • Code dependencies checking;
  • It may be necessary to change code Compilation path(s).

 

How can this be performed?

Currently there are already some solutions in the market to solve these problems. Although the tool selected for compiling some or all the TIMBUS code repositories was Repo, we will also list some other possible alternatives that were considered.

 


Repo tool

The Repo tool [1] [2] [3] [4] was developed by OHA for the Android Open Source Project[1]. It is a Git wrapper to facilitate the management of several Git repositories. It has several advantages over the tool Git submodules (see below), however it is not part of the current Linux distributions, meaning that it is necessary to install it previously to use it.

This was the tool selected for gathering all the Git repositories concerning TIMBUS on the site opensourceprojects.eu.

 

The Repo tool is fairly simple to use and manage. Instead of having an umbrella repository, this tool only requires the creation of a Git repository with a file named default.xml. This XML file describes which are the remote Git servers, where they are, and how they are managed in the local system.

 

Example of file default.xml for the Repo tool:

<?xml version="1.0" encoding="UTF-8"?>
<manifest> 

    <remote name="opensourceprojects" fetch="http://opensourceprojects.eu" />

    <default remote="opensourceprojects" sync-c="true" sync-j="4" />

    <project path="local/users-extractor" 
    name="git/p/timbus/context-population/extractors/local/users-extractor-perl" revision="master"/>

    <project path="local/debian-sw" 
    name="git/p/timbus/context-population/extractors/local/debian-sw" revision="master"/>

    <project path="local/network-info-perl" 
    name="git/p/timbus/context-population/extractors/local/network-info-perl" revision="master"/>

    <project path="local/perl-modules" 
    name="git/p/timbus/context-population/extractors/local/perl-modules" revision="master"/>

</manifest

 

Start working with Repo

To be able to work with Repo, one must perform the preparation steps, for installing the Repo environment, and then start using the tool.

 

Preparing the environment for running Repo:

If you prefer you can run this script to perform the whole installation.

 

Alternatively, you may perform a manual installation:

Install Step #1: Install the curl application, if this is not installed already. This may be done using:

$ sudo apt-get install curl

 

Install Step #2: Download Repo from Google Repositories. This should be downloaded to a directory included in the path. We suggest this is done on the ~/bin directory which is frequently included in the path:

$ mkdir ~/bin
$ echo "PATH=~/bin:\$PATH; export PATH" >> ~/.bashrc 
$ . ~/.bashrc

$ curl http://commondatastorage.googleapis.com/git-repo-downloads/repo > ~/bin/repo
$ chmod a+x ~/bin/repo

 

 

After having the environment ready for running Repo, it is time to work with it. To achieve this, the steps are:

Run Step #1: Create the base directory where you want all the repositories to be stored:

$ mkdir umbrella_repository
$ cd umbrella_repository

 

Run Step #2: Git clone the repository which only has the file default.xml, which defines the concatenation of repositories to be performed;

$ repo init -u http://opensourceprojects.eu/git/p/timbus/context-population/extractors/local/local-extractors

The repo metadata is stored in a folder with the name ".repo" in the same folder where you use the "repo init" command. If for some reason there is an error, you can remove that folder and retype the command.

 

Run Step #3: Now, and every time you need to get all defined repositories, sync all the code repositories:

$ repo sync

 


Git submodules

From Git the response to this problem is the following:

  • Create all external Git repositories
  • Create a new repository where you can add the externals repositories into a separate folder
  • Git keeps track (with some user help) of external Git repositories

 

Some code:

    mkdir umbrella-git; cd umbrella-git
    git init

 
Now we have an empty umbrella Git. Let's start to add the external Git repositories.

    git submodule add https://url1 git1
    git submodule add https://url2 git2
    git submodule add https://url3 git3
    git submodule add https://url4 git4
    git submodule add https://url5 git5

 

url1 ... url5 and git1 ... git5 are just examples of urls and local folders.

 
This creates git1 to git5 folders and a .gitmodules file. The .gitmodules file has the url and location of the external Git repository, this way Git knows that specific folders are submodules and has to treat them differently.
 

Now it's time to commit the .gitmodules and the submodules folders (git1 to git5).
We can do this with:

    git commit -a -s -m "Added all external submodules"

 

Cloning the umbrella Git repository

To clone this kind of repositories is necessary to explicit say to Git that it needs to checkout recursively all submodules with the command:

    git clone --recursive http://url/umbrella-git.git umbrella-git

 

Working with submodules

 

Git Submodules has some added features to work with submodules. It is possible through foreach keyword to execute a command in each submodule folder.

 

    git submodules foreach command

 

If you need to keep track of all submodules new commits you can do the following:

 

    git submodules foreach git remote update

 

Or if you want to pull all changes into master branch from a remote repository.
 

    git submodules foreach git pull origin master

 

Or, if you want to tag all submodules with a specific tag.

 

    git submodules foreach git tag -a version-1.0 -m "Code version 1"

 

Drawbacks

 
Even though Git submodules seem to be a great feature, it lacks[5] [6] some tools to track the submodules' dynamic behaviour.

 


Git Subtree

Git subtrees [7]

 
 


References

[1]: http://source.android.com/source/developing.html
[2]: http://source.android.com/source/using-repo.html
[3]: http://source.android.com/source/downloading.html
[4]: http://google-opensource.blogspot.pt/2008/11/gerrit-and-repo-android-source.html
[5]: http://blogs.atlassian.com/2013/03/git-submodules-workflows-tips/
[6]: http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/
[7]: https://raw2.github.com/git/git/master/contrib/subtree/git-subtree.txt

Attachments
multi_repository_preparation.sh (1810 bytes)

Related

Wiki: Home
Wiki: Home