Plink PED / MAP file format for Unrelateds & Nuclear families

This is a default text file format used by Plink. This format is useful if you want to edit the files in a text editor. However, it requires a substantial HDD usage for large datasets and thus longer running times to parse it. A detailed description of the ped/map file format can be found here. To describe it here briefly, consider the following individuals:

All the individuals were typed on 3 SNPs (SNP1, SNP2 & SNP3) giving the following genetic data:

       SNP1  SNP2  SNP3
UNR1   A/A   T/C   ?/?
UNR2   A/G   T/C   A/T
TRIOF  A/G   T/C   A/T
TRIOM  A/G   T/C   A/T
TRIOC  A/A   C/T   A/T
DUOP   G/A   T/C   A/A
DUOC   A/A   T/C   A/A

PED file

The PED file describes the individuals and the genetic data. The PED file corresponding to the example dataset is:

FAM1 IND1  0     0     1 0 A A T T 0 0
FAM2 IND2  0     0     1 0 A G T C T A
FAM3 TRIOF 0     0     1 0 A G T C A T
FAM4 TRIOM 0     0     2 0 A G T C A T
FAM5 TRIOC TRIOF TRIOM 1 0 A A C T A T
FAM6 DUOP  0     0     2 0 G A T C A A
FAM7 DUOC  DUOP  0     2 0 A A T C A A

This file can be SPACE or TAB delimited. Each line corresponds to a single individual. The first 6 columns are:

  1. Family ID [string]
  2. Individual ID [string]
  3. Father ID [string]
  4. Mother ID [string]
  5. Sex [integer]
  6. Phenotype [float]

Columns 7 & 8 code for the observed alleles at SNP1, columns 9 & 10 code for the observed alleles at SNP2, and so on. Missing data are coded as "0 0" as for example for SNP3 of IND1. This file should have N lines and 2L+6 columns, where N and L are the numbers of individuals and SNPs contained in the dataset respectively.

Each individual must have an unique ID containing only alphanumeric characters.

MAP file

The MAP file describes the SNPs. The MAP file corresponding to the example dataset is:

7 SNP1 0 123
7 SNP2 0 456
7 SNP3 0 789

This file can be SPACE or TAB delimited. Each line corresponds to a SNP. The 4 columns are:

  1. Chromosome number [integer]
  2. SNP ID [string]
  3. SNP genetic position (cM) [float]
  4. SNP physical position (bp) [integer]

This file should have L lines and 4 columns, where L is the number of SNPs contained in the dataset.

Each SNP must have a unique physical position. All the SNPs must be ordered by physical position.

PED/MAP to BED/BIM/FAM conversion

To convert myPlinkTextData.ped and myPlinkTextData.map in Plink binary format, use Plink as follows:

plink --file myPlinkTextData --make-bed --out myPlinkBinaryData

PED/MAP to GEN/SAMPLE conversion

To convert myPlinkTextData.ped and myPlinkTextData.map in GEN/SAMPLE format, use the software package GTOOL that you can find here as follows:

gtool -P --ped myPlinkTextData.ped --map myPlinkTextData.map --og myGtoolTextData.gen --os myGtoolTextData.sample

Note that GTOOL has more options to tweak the conversion between formats.