Automated installation

GEMINI contains an automated installation script which installs GEMINI along with required Python dependencies, third party software and data files.

$ wget

While other tools (e.g., curl) can be used to download this file, wget is required by the installation script. Please install wget if it is not already available:

Once you have downloaded the above installation script, you can proceed as follows:

$ python /usr/local /usr/local/share/gemini
$ export PATH=$PATH:/usr/local/gemini/bin
# it would be wise to add the above line to your ``.bashrc`` or ``.bash_profile``

This installs the GEMINI executable as /usr/local/bin/gemini, other required third party dependencies in /usr/local/bin, and associated data files in /usr/local/share/gemini. Please note that this is merely an example: one can place the GEMINI executables and annotation files in any directories one wishes.


The automated installtion script typically requires ~30 minutes, primarily owing to the time required to download the GEMINI genome annotation files. Also, please note that the annotation files requires ~15Gb of storage, so please ensure that the data directory (/usr/local/share/gemini in the example above) has sufficient space.


Some tips and tricks for installation issues:

  1. Some older versions of wget have certificate problems with GitHub files. If you run into this problem, you can alternatively download the install script using``wget –no-check-certificates`` or curl -O.
  2. The installation script is idempotent and you can re-run it multiple times without any issues. If you experience internet connectivity or other transient errors during installation, a re-run can often solve the problem (fingers crossed).
  3. If you’re installing behind a proxy you’ll need to set proxy information in a condarc file and also set all_proxy, http_proxy and https_proxy in your .bashrc file. See this mailing list discussion for more information.
  4. The installer tries to ignore site-wide environmental variables pointing to other Python installations, but if you have issues with pulling in Python libraries from other locations, you clear these by unsetting/setting these environmental variables before running: unset PYTHONPATH && unset PYTHONHOME && export PYTHONNOUSERSITE=1


The installer requires:

  • Python 2.7.x
  • git
  • wget
  • a working C / C++ compiler such as gcc
  • zlib (including headers)

These dependencies already exist on most UNIX/LINUX/OS X machines. However, on minimalist machines such as fresh Amazon Cloud instances, one may have to install these dependencies before running the automated installer. In the case of Amazon instances, the following command should take care of all of the above dependencies:

$ sudo yum -y install python27 git wget gcc gcc-c++ zlib-devel

Installing without root access.

As many users do not have root or sudo access, the automated installer also has options to install in “non-root” environments:

$ python ~/gemini ~/gemini --nosudo

Updating your PATH to find the GEMINI executable

At this point, you will have a self-contained installation of GEMINI, including both the software and its associated genome annotations. However, if you have done a custom install in a “non-root” enviornment, you will first need to update your PATH environment variable to include the path to the bin directory that you just created by running the automated installer.

For example, if, as above, you placed you custom install in ~/gemini, you would need to update your PATH as follows. It would be wise to also add this to your .bashrc or .bash_profile:

$ export PATH=$PATH:~/gemini/bin

Note that this change will only last for the life of your current terminal session. To make this more permanent, update your .bash_profile so that this change is made each time you login.

If successful, you should be able to run the following command from anywhere on your system:

$ gemini -v
gemini 0.3.0b

Running unit tests

gemini only installs the software itself, not the source repository with the tests. To run the tests, do:

$ git clone
$ git checkout v0.18.3 # or the current version that you installed
$ cd gemini
$ bash

Updating the GEMINI executables and annotations

Once installed with the automated installer, it is easy to upgrade the GEMINI programs and associated data files to the latest released version with:

$ gemini update

There are also flags to update to the latest development version of the code or to only update data files without updating the source:

$ gemini update --devel
$ gemini update --dataonly

To download optional large files associated with specific analyses in GEMINI, like GERP scores per base pair and CADD scores, pass the --extra flag:

$ gemini update --dataonly --extra cadd_score
$ gemini update --dataonly --extra gerp_bp

Software dependencies

GEMINI depends upon several widely-used genomics command line software as well as multiple Python packages. We recognize that the dependency stack is quite deep and are working on ways to minimize dependencies in the interest of the most streamlined installation process possible. Nonetheless, the following are core dependencies:

  1. Python 2.7.x
  2. grabix
  3. tabix (only by annotate tool)
  4. bedtools (only by windower tool)
  5. pybedtools (only by windower tool)

Manual installation

Once the above dependencies have been installed, one can begin installing GEMINI itself. To install you should download the latest source code from GitHub, either by going to:

and clicking on “Downloads”, or by cloning the git repository with:

$ git clone

Once you have the source code, run:

$ cd gemini
$ sudo python install

to install it. If you don’t have permission to install it in the default directory, you can simply build the source in-place and use the package from the git repository:

$ python build_ext --inplace

Installing annotation files

One of the more appealing features in GEMINI is that it automatically annotates variants in a VCF file with several genome annotations. However, you must first install these data files on your system. It’s easy enough — you just need to run the following script and tell it in which full path you’d like to install the necessary data files. The recommended path is /usr/local/share, but you can install the data files wherever you want.

$ python gemini/ /usr/local/share/


Annotation files like GERP at base pair resolution and CADD scores are not part of this default installation owing to their large file size. They may however be installed as additional data files using the gemini update --dataonly option, with the flag --extra for gerp_bp and cadd_score.

Using previously installed annotation files

If you have installed GEMINI with the annotation files on a server and you can NFS mount the annotation files, you can tell a local install of GEMINI where those annotation files are by making the file ~/.gemini/gemini-config.yaml:

annotation_dir: /path/to/nfs_mounted/gemini/data
  GRCh37-gms-mappability.vcf.gz: 2
  hg19.rmsk.bed.gz: 2

Running the testing suite

GEMINI comes with a full test suite to make sure that everything has installed correctly on your system. We strongly encourage you to run these tests.

$ bash
