Target SNP selection in complex disease association studies

Matthias Wjst

doi:10.1186/1471-2105-5-92

Target SNP selection in complex disease association studies

BMC Bioinformatics. 2004 Jul 12:5:92. doi: 10.1186/1471-2105-5-92.

Author

Matthias Wjst¹

Affiliation

¹ Gruppe Molekulare Epidemiologie, Institut für Epidemiologie, GSF - Forschungszentrum für Umwelt und Gesundheit, Ingolstädter Landstrasse 1, D-85758 Neuherberg/Munich, Germany. wjst@gsf.de

Abstract

Background: The massive amount of SNP data stored at public internet sites provides unprecedented access to human genetic variation. Selecting target SNP for disease-gene association studies is currently done more or less randomly as decision rules for the selection of functional relevant SNPs are not available.

Results: We implemented a computational pipeline that retrieves the genomic sequence of target genes, collects information about sequence variation and selects functional motifs containing SNPs. Motifs being considered are gene promoter, exon-intron structure, AU-rich mRNA elements, transcription factor binding motifs, cryptic and enhancer splice sites together with expression in target tissue. As a case study, 396 genes on chromosome 6p21 in the extended HLA region were selected that contributed nearly 20,000 SNPs. By computer annotation ~2,500 SNPs in functional motifs could be identified. Most of these SNPs are disrupting transcription factor binding sites but only those introducing new sites had a significant depressing effect on SNP allele frequency. Other decision rules concern position within motifs, the validity of SNP database entries, the unique occurrence in the genome and conserved sequence context in other mammalian genomes.

Conclusion: Only 10% of all gene-based SNPs have sequence-predicted functional relevance making them a primary target for genotyping in association studies.

Publication types

Comparative Study

MeSH terms

Amino Acid Substitution / genetics
Chromosome Mapping / methods
Chromosomes, Human, Pair 6 / genetics
Computational Biology / methods
Databases, Genetic
Exons / genetics
Genes / genetics
Genetic Linkage / genetics*
Genetic Variation / genetics
Genome, Human
Humans
Introns / genetics
Mutation, Missense / genetics
Polymorphism, Single Nucleotide / genetics*