SHAPEIT v2.727 (25 September 2013)
The main improvement in SHAPEIT version 2 resides in the new model used for the conditioning haplotypes. It is a generalisation of the "surrogate family" phasing idea that can now be applied at the whole chromosome scale. It results in improved accuracy and speed compared to version 1. This new model is used by default and can be controlled via two parameters, the average window size (W) and the number of haplotypes to be considered per window (K). Default values for these parameters can be changed using the options --window and --states respectively. The graph representation of the conditioning haplotypes (the underlying model of version 1) can still be used if you add --version1 flag to the SHAPEIT command line.
Major changes made
- The SHAPEIT2 model is now used by default with --states 100 and --window 2.
- The SHAPEIT1 model can still be used if the --version1 flag is added to the command line.
- Option --chrX to phase X chromosomes has been improved: better compatibility with Impute2, more checks for haploid heterozygous, possibility of using a reference panel.
- Option --input-ref to phase using a reference panel has been improved: SNP alignment between the study genotypes and the reference panels is much easier.
- Subsets of the reference panel can be ignored using --exclude-grp and --include-grp options. It makes things easier if you want to consider only a subset of the reference haplotypes.
- MCMC iterations can be discarded when using a reference panel with --no-mcmc flag. Each individual is phased in turn using only the reference panel of haplotypes.
- Default number of iterations has been halved from 70 [10B+10P+50M] to 35 [7B+8P+20M].
- Haplotype graphs now requires a single file to be stored, instead of 3 as in version 1.
- Verbose on the screen has been clarified.
- --version issue has been fixed
- To reduce command line length; (a) many available options have now a short name (ex: --input-bed can be written as -B), and (b) file set with common prefix can be specified only with the prefix (ex: --input-bed file.bed file.bim file.fam can be written as --input-bed file)
- A new mode has been added: shapeit -check [options]. It contains all functions related to data verification. It reads all input files and spot any problems related to high call rates, Mendel error rates or haploid heterozygous. Summary statistics are given in several log files automatically generated.
- A new mode has been added: shapeit -convert [options]. It contains useful functions to manipulate SHAPEIT output files; (a) haplotype graph files generated using --output-graph and (b) haplotypes generated using --output-max.
- shapeit.v2.r727.linux.x64 (1856)
What version of SHAPEIT do I have?
To know the SHAPEIT version you have, use this command:
How to ask a question, report a problem and know about new versions?
Subscribe on the OXSTATGEN mailing list.
If you experience any problems with SHAPEIT, hereafter some advice before mailing:
- Make sure you are using the latest version since your problem may have already been fixed
- Check carefully the screen output and the log file. The problem may be reported here since they both contain many details about what is going on.
If the problem persists, ask your question on the OXSTATGEN mailing list and we will be happy to answer!
SHAPEIT v1.X (old versions)
It is strongly advised to download and use the latest version (that you can find below) instead of this versions since some minor bugs were corrected.
v1.ESHG (23 June 2012)
- Major: New option --chrX to phase X chromosomes.
- Major: New option --input-ref to phase using a reference panel of haplotypes (1KGP).
- Major: More complete log files with many useful statistics that can be direclty input in R about missingness per snp/individual, allele frequencies, mendel errors, heterozygous haploids, etc ...
- Minor: All input files can be given as Gzipped or Bzipped as soon as the correct file extension is given (.gz and .bz2). They will be internally decompressed.
- Minor: All output files can be internally compressed using Gzip or Bzip2 as soon as the correct file extension is given (.gz and .bz2).
- Minor: Any information in the input sample file are propagated into the output sample file as the phenotypes for example. Idem for PED and FAM files.
- Minor: Verbose on the screen was reduced.
- Minor: Unphased genotype files can be given in a more parsimonious way. For example --input-bed file.bed file.bim file.fam can be now written as --input-bed file.
- shapeit.v1.ESHG.linux.x64 (1484)
- Minor: bugfix in trio/duo phasing.
- shapeit.v1.r532.linux.x64 (987)
- shapeit.v1.r532.MacOSX (850)
- shapeit.v1.r532.solaris.32bit (724)
- shapeit.v1.r532.solaris.64bit (807)
- Minor: When running on ped/map files at monomorphic SNPs the output files listed the two alleles as the observed allele and 0. This has now been changed so that the observed allele is listed twice. This provides better compatibility with IMPUTE2.
- shapeit.v1.r416.linux.x64 (14 Dec 2011) (1144)
- shapeit.v1.r416.MacOSX (14 Dec 2011) (844)
- shapeit.v1.r416.Itanium (14 Dec 2011) (770)
- shapeit.v1.r416.Solaris.32bit (14 Dec 2011) (669)
- shapeit.v1.r416.Solaris.64bit (14 Dec 2011) (640)
- Major: Unrelateds are ordered in the same way in input and output files.
- Minor: Covariates and phenotypes in input sample file are given in output sample file (still don't work when the option --output-graph is used).
- Minor: When output window coordinates are not consistent with input window coordinates, they are automatically fixed
- Major: SNPs are not uniquely identified by their ID anymore, but rather by their physical position.
- Minor: Compiled on older Linux version for better compatibility.
- Minor: Improved initial data checking (calling rate / Mendel error rate).
- Minor: New data sub-setting options ( --include-ind --exclude-ind --include-snp --exclude-snp --output-from --output-to )
- Minor: New testing option to check that data reading is OK ( --test-reading )
- shapeit.v1.r331.linux.x64 (20 Nov 2011) (754)
Academic License Agreement
The bioinformatics department of Conservatoire National des Arts et Metiers (CNAM) has developed a new algorithm for a faster computation of hidden Markov models, based on graph representations. This algorithm has been notably applied for the reconstruction of haplotypes from population genotypic data leading to the SHAPEIT software. This algorithm and its applications, including SHAPEIT, are patent pending. The Conservatoire National des Arts et M?tiers (CNAM), Prof. Jean-Fran?ois ZAGURY and his group of the bioinformatics department (the developers), give permission for you and your laboratory (Institution) to use SHAPEIT. CNAM and the developers allow researchers at your Institution to copy and modify SHAPEIT for internal, non-profit research purposes, on the following conditions:
The SHAPEIT software remains at your Institution and is not published, distributed, or otherwise transferred or made available to other than Institution employees and students involved in research under your supervision.
If you wish to obtain SHAPEIT for any commercial purposes or for diffusion through the internet, you will need to execute a separate licensing agreement with CNAM and pay a fee.
This includes, but is not limited to, using SHAPEIT to provide services to outside parties for a fee. In that case please contact :
Pr. Zagury, CNAM.
Tel : 33 1 58 80 88 20
Mail : zagury at cnam.fr
- You retain in SHAPEIT and any modifications to SHAPEIT, the copyright, trademark, or other notices pertaining to SHAPEIT as provided by CNAM.
You provide the developers with feedback on the use of SHAPEIT in your research, and that the Developers and CNAM are permitted to use any information you provide in making changes to the SHAPEIT software.
All bug reports and technical questions shall be sent to:
Dr. Delaneau, CNAM.
Mail: olivier.delaneau at gmail.com
- You acknowledge that the developers, CNAM and its licensees may develop modifications to SHAPEIT that may be substantially similar to your modifications of SHAPEIT, and that the developers, CNAM and its licensees shall not be constrained in any way by you in CNAM's or its licensees' use or management of such modifications. You acknowledge the right of the developers and CNAM to prepare and publish modifications to SHAPEIT that may be substantially similar or functionally equivalent to your modifications and improvements, and if you obtain patent protection for any modification or improvement to SHAPEIT you agree not to allege or enjoin infringement of your patent by the Developers, CNAM or by any of CNAM's licensees obtaining modifications or improvements to SHAPEIT from the CNAM or the Developers.
If utilisation of the SHAPEIT software results in outcomes which will be published, please specify the version of SHAPEIT you used and cite the source below :
O. Delaneau, J. Marchini, JF. Zagury. A linear complexity phasing method for thousands of genomes. Nature Methods . Published online Nat Methods. 2011 Dec 4;9(2):179-81. doi: 10.1038/nmeth.1785.
- Any risk associated with using the SHAPEIT software at your institution is with you and your Institution. SHAPEIT is experimental in nature and is made available as a research courtesy "AS IS," without obligation by CNAM to provide accompanying services or support. CNAM AND THE AUTHORS EXPRESSLY DISCLAIM ANY AND ALL WARRANTIES REGARDING THE SOFTWARE, WHETHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES PERTAINING TO MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Commercial License Agreement
A specific license must be obtained for any commercial or for-profit organization or for any web-diffusion purpose.
For more information one needs to contact Prof. Zagury at:
Chaire de Bioinformatique
292 rue Saint-Martin
75003 - PARIS
Tel : 33 1 58 80 88 20
Mail : email@example.com