Tuesday, 06 November 2012 at 7:56 pm

The GL and PL genotype fields in the .VCF file format contain probabilities for a specific genotype defined by the REF and ALT bases (column 3 and 4 of the file). 

The ordering of the allele combinations is defined in the vcf 4.1 specifications as:

If A is the allele in REF and B,C,... are the alleles as ordered in ALT, the ordering of genotypes for the likelihoods is given by: F(j/k) = (k*(k+1)/2)+j.  In other words, for biallelic sites the ordering is: AA,AB,BB; for triallelic sites the ordering is: AA,AB,BB,AC,BC,CC, etc. 

To get this ordering, you can use this python snippet:

bases = ['A','G','C']  #the first base in this array, A, is the REF base. 
#G,C are the ALT bases
  for i in range(len(bases)): base = bases[i] for a in range(i + 1): print bases[a] + base

Running this snippet will output the correct order:

AA
AG
GG
AC
GC
CC







Search

Categories


Archive