Plink PED / MAP file format for Unrelateds & Nuclear families
This is a default text file format used by Plink. This format is useful if you want to edit the files in a text editor. However, it requires a substantial HDD usage for large datasets and thus longer running times to parse it. A detailed description of the ped/map file format can be found here. To describe it here briefly, consider the following individuals:
- 2 Unrelateds (UNR1 & UNR2).
- 1 Trio (father=TRIOF, mother=TRIOM & child=TRIOC).
- 1 Duo (parent=DUOP & child=DUOC).
All the individuals were typed on 3 SNPs (SNP1, SNP2 & SNP3) giving the following genetic data:
SNP1 SNP2 SNP3
UNR1 A/A T/C ?/?
UNR2 A/G T/C A/T
TRIOF A/G T/C A/T
TRIOM A/G T/C A/T
TRIOC A/A C/T A/T
DUOP G/A T/C A/A
DUOC A/A T/C A/A
The PED file describes the individuals and the genetic data. The PED file corresponding to the example dataset is:
FAM1 IND1 0 0 1 0 A A T T 0 0
FAM2 IND2 0 0 1 0 A G T C T A
FAM3 TRIOF 0 0 1 0 A G T C A T
FAM4 TRIOM 0 0 2 0 A G T C A T
FAM5 TRIOC TRIOF TRIOM 1 0 A A C T A T
FAM6 DUOP 0 0 2 0 G A T C A A
FAM7 DUOC DUOP 0 2 0 A A T C A A
This file can be SPACE or TAB delimited. Each line corresponds to a single individual. The first 6 columns are:
- Family ID [string]
- Individual ID [string]
- Father ID [string]
- Mother ID [string]
- Sex [integer]
- Phenotype [float]
Columns 7 & 8 code for the observed alleles at SNP1, columns 9 & 10 code for the observed alleles at SNP2, and so on. Missing data are coded as "0 0" as for example for SNP3 of IND1. This file should have N lines and 2L+6 columns, where N and L are the numbers of individuals and SNPs contained in the dataset respectively.
The MAP file describes the SNPs. The MAP file corresponding to the example dataset is:
7 SNP1 0 123
7 SNP2 0 456
7 SNP3 0 789
This file can be SPACE or TAB delimited. Each line corresponds to a SNP. The 4 columns are:
- Chromosome number [integer]
- SNP ID [string]
- SNP genetic position (cM) [float]
- SNP physical position (bp) [integer]
This file should have L lines and 4 columns, where L is the number of SNPs contained in the dataset.
PED/MAP to BED/BIM/FAM conversion
To convert myPlinkTextData.ped and myPlinkTextData.map in Plink binary format, use Plink as follows:
plink --file myPlinkTextData --make-bed --out myPlinkBinaryData
PED/MAP to GEN/SAMPLE conversion
To convert myPlinkTextData.ped and myPlinkTextData.map in GEN/SAMPLE format, use the software package GTOOL that you can find here as follows:
gtool -P --ped myPlinkTextData.ped --map myPlinkTextData.map --og myGtoolTextData.gen --os myGtoolTextData.sample
Note that GTOOL has more options to tweak the conversion between formats.