TY - JOUR
T1 - Imputation of genotypes from low density (50,000 markers) to high density (700,000 markers) of cows from research herds in Europe, North America, and Australasia using 2 reference populations
AU - Pryce, JE
AU - Johnston, J
AU - Hayes, BJ
AU - Sahana, G
AU - Weigel, KA
AU - McParland, S
AU - Spurlock, D
AU - Krattenmacher, N
AU - Spelman, RJ
AU - Wall, E
AU - Calus, MPL
N1 - 1023378
PY - 2014
Y1 - 2014
N2 - Combining data from research herds may be advantageous,
especially for difficult or expensive-to-measure
traits (such as dry matter intake). Cows in research
herds are often genotyped using low-density single
nucleotide polymorphism (SNP) panels. However, the
precision of quantitative trait loci detection in genomewide
association studies and the accuracy of genomic
selection may increase when the low-density genotypes
are imputed to higher density. Genotype data
were available from 10 research herds: 5 from Europe
[Denmark, Germany, Ireland, the Netherlands, and the
United Kingdom (UK)], 2 from Australasia (Australia
and New Zealand), and 3 from North America (Canada
and the United States). Heifers from the Australian
and New Zealand research herds were already genotyped
at high density (approximately 700,000 SNP).
The remaining genotypes were imputed from around
50,000 SNP to 700,000 using 2 reference populations.
Although it was not possible to use a combined reference
population, which would probably result in the
highest accuracies of imputation, differences arising
from using 2 high-density reference populations on imputing
50,000-marker genotypes of 583 animals (from
the UK) were quantified. The European genotypes (n
= 4,097) were imputed as 1 data set, using a reference
population of 3,150 that included genotypes from 835
Australian and 1,053 New Zealand females, with the remainder
being males. Imputation was undertaken using
population-wide linkage disequilibrium with no family
information exploited. The UK animals were also included
in the North American data set (n = 1,579) that
was imputed to high density using a reference population
of 2,018 bulls. After editing, 591,213 genotypes
on 5,999 animals from 10 research herds remained.
The correlation between imputed allele frequencies of
the 2 imputed data sets was high (>0.98) and even
stronger (>0.99) for the UK animals that were part of
each imputation data set. For the UK genotypes, 2.2%
were imputed differently in the 2 high-density reference
data sets used. Only 0.025% of these were homozygous
switches. The number of discordant SNP was lower for
animals that had sires that were genotyped. Discordant
imputed SNP genotypes were most common when a
large difference existed in allele frequency between the
2 imputed genotype data sets. For SNP that had ≥20%
discordant genotypes, the difference between imputed
data sets of allele frequencies of the UK (imputed)
genotypes was 0.07, whereas the difference in allele
frequencies of the (reference) high-density genotypes
was 0.30. In fact, regions existed across the genome
where the frequency of discordant SNP was higher. For
example, on chromosome 10 (centered on 520,948 bp),
52 SNP (out of a total of 103 SNP) had ≥20% discordant
SNP. Four hundred and eight SNP had more than
20% discordant genotypes and were removed from the
final set of imputed genotypes. We concluded that both
discordance of imputed SNP genotypes and differences
in allele frequencies, after imputation using different
reference data sets, may be used to identify and remove
poorly imputed SNP
AB - Combining data from research herds may be advantageous,
especially for difficult or expensive-to-measure
traits (such as dry matter intake). Cows in research
herds are often genotyped using low-density single
nucleotide polymorphism (SNP) panels. However, the
precision of quantitative trait loci detection in genomewide
association studies and the accuracy of genomic
selection may increase when the low-density genotypes
are imputed to higher density. Genotype data
were available from 10 research herds: 5 from Europe
[Denmark, Germany, Ireland, the Netherlands, and the
United Kingdom (UK)], 2 from Australasia (Australia
and New Zealand), and 3 from North America (Canada
and the United States). Heifers from the Australian
and New Zealand research herds were already genotyped
at high density (approximately 700,000 SNP).
The remaining genotypes were imputed from around
50,000 SNP to 700,000 using 2 reference populations.
Although it was not possible to use a combined reference
population, which would probably result in the
highest accuracies of imputation, differences arising
from using 2 high-density reference populations on imputing
50,000-marker genotypes of 583 animals (from
the UK) were quantified. The European genotypes (n
= 4,097) were imputed as 1 data set, using a reference
population of 3,150 that included genotypes from 835
Australian and 1,053 New Zealand females, with the remainder
being males. Imputation was undertaken using
population-wide linkage disequilibrium with no family
information exploited. The UK animals were also included
in the North American data set (n = 1,579) that
was imputed to high density using a reference population
of 2,018 bulls. After editing, 591,213 genotypes
on 5,999 animals from 10 research herds remained.
The correlation between imputed allele frequencies of
the 2 imputed data sets was high (>0.98) and even
stronger (>0.99) for the UK animals that were part of
each imputation data set. For the UK genotypes, 2.2%
were imputed differently in the 2 high-density reference
data sets used. Only 0.025% of these were homozygous
switches. The number of discordant SNP was lower for
animals that had sires that were genotyped. Discordant
imputed SNP genotypes were most common when a
large difference existed in allele frequency between the
2 imputed genotype data sets. For SNP that had ≥20%
discordant genotypes, the difference between imputed
data sets of allele frequencies of the UK (imputed)
genotypes was 0.07, whereas the difference in allele
frequencies of the (reference) high-density genotypes
was 0.30. In fact, regions existed across the genome
where the frequency of discordant SNP was higher. For
example, on chromosome 10 (centered on 520,948 bp),
52 SNP (out of a total of 103 SNP) had ≥20% discordant
SNP. Four hundred and eight SNP had more than
20% discordant genotypes and were removed from the
final set of imputed genotypes. We concluded that both
discordance of imputed SNP genotypes and differences
in allele frequencies, after imputation using different
reference data sets, may be used to identify and remove
poorly imputed SNP
KW - High-density genotyping
KW - Imputation
U2 - 10.3168/jds.2013-7368
DO - 10.3168/jds.2013-7368
M3 - Article
SN - 1525-3198
VL - 97
SP - 1799
EP - 1811
JO - Journal of Dairy Science
JF - Journal of Dairy Science
IS - 3
ER -