Stata commands for moving data between PHASE and HaploView Stata Conference DC ‘09 July 30-31, 2009 John Charles “Chuck” Huber Jr, PhD Assistant Professor of Biostatistics Department of Epidemiology and Biostatistics School of Rural Public Health Texas A&M Health Science Center [email protected]
47
Embed
Stata commands for moving data between PHASE and HaploView Stata Conference DC ‘09 July 30-31, 2009 John Charles “Chuck” Huber Jr, PhD Assistant Professor.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Stata commands for moving data between PHASE and HaploView
Stata Conference DC ‘09July 30-31, 2009
John Charles “Chuck” Huber Jr, PhDAssistant Professor of Biostatistics
Department of Epidemiology and BiostatisticsSchool of Rural Public Health
Many rapidly growing areas of research utilize multiple specialty “boutique” computer programs to conduct highly specialized analyses.
The Stata user is faced with two choices:1. Write new Stata commands that do the same analyses2. Write Stata commands that efficiently export and import data for these
“boutique” programs
Stata for Genetic Data Analysis
Outline
1. Genetic Data Analysis using Stata2. Genetics Background3. The “file” commands in Stata4. The phasein and phaseout commands5. The HaploView program6. The haploviewout command7. Summary
Stata for Genetic Data Analysis
2007 UK Stata Users Group meeting:http://www.stata.com/meeting/13uk/
A brief introduction to genetic epidemiology using Stata Neil Shephard, University of Sheffield
An overview of using Stata to perform candidate gene association analysis will be presented. Areas covered will include data manipulation, Hardy–Weinberg equilibrium, calculating and plotting linkage disequilibrium, estimating haplotypes, and interfacing with external programs.
Programs written by David Clayton• ginsheet- Read genotype data from text files.• gloci - Make a list of loci.• greshape - Reshape a file containing genotypes to a file of alleles.• gtab - Tabulate allele frequencies within genotypes and generate indicators (performs Hardy-Weinberg
Equilibrium testing).• gtype - Create a single genotype variable from two allele variables.• htype - Create a haplotype variable from allele variables.• mltdt - Multiple locus TDT for haplotype tagging SNPs (htSNPs).• origin - Analysis of parental origin effect in TDT trios.• pseudocc - Create a pseudo-case-control study from case-parent trios.• pscc - Experimental version of pseudocc in which there may be several groups of linked loci.• pwld - Pairwise linkage disequilibrium measures.• rclogit - Conditional logistic regression with robust standard errors.• snp2hap - Infer haplotypes of 2-locus SNP markers.• tdt - Classical TDT test.• trios - Tabulate genotypes of parent-offspring trios.
User Written Genetics Commands
Programs written by Adrian Mander• gipf - Graphical representation of log-linear models.• hapipf - Haplotype frequency estimation using an EM algorithm and log-linear modelling.• pedread - Read's pedigree data file (in pre-Makeped LINKAGE format), similar to ginsheet• pedsumm - Summarises a pre-Makeped LINKAGE file that is currently in Stata's memory.• pedraw - Draws one pedigree in the graphics window• plotmatrix - Produces LD heatmaps displaying graphically the strength of LD between markers.• profhap - Calculates profile likelihood confidence intervals for results from hapipf• swblock - A step-wise hapipf routine to identify the parsimonious model to describe the Haplotype block
pattern.• qhapipf - Analysis of quantitative traits using regression and log-linear modelling when phase is unknown.• hapblock - attempts to find the edge of areas containing high LD within a set of loci
User Written Genetics Commands
Programs written by Mario Cleves• gencc - Genetic case-control tests• genhw - Hardy-Weinberg Equilibrium tests• qtlsnp - A program for testng associations between SNPs an a quantitative trait.
Programs written by Catherine Saunders• co_power - Power calculations for Case-only study designs.• gei_matching - • geipower - Power calculations for Gene-Environment interactions.• ggipower - Power calculations for Gene-Gene interactions.• tdt_geipower - Power calculations for Gene-Environment interactions via TDT analysis.• tdt_ggipower - Power calculations for Gene-Gene interactions via TDT analysis.
Programs written by Neil Shephard• genass- Performs a number of statistical tests on your genotypic data and collates the results into a Stata
formatted data set for browsing.
The Post-Genome Era
February 16, 2001February 15, 2001
Scientific Method: Observe
Hartl & Jones (1998) pg 18, Figure 1.13
Scientific Method: Predict
Watson et al. (2004) pg 29, Box 2-2
Scientific Method: Manipulate
The Structure of DNA
Hartl & Jones (1998) pg 9, Figure 1.5
The Structure of DNA
Watson et al. (2004) pg 23, Figure 2.5
What is a SNP?
• A SNP is a single nucleotide polymorphism (the individual nucleotides are called alleles)
Allelic Association• Simple 2x2 table• One table per SNP• Compute a simple chi-squared statistic or odds
ratio for each SNP
SNP1 Allele
g c
Case 250 750
Control 650 350
Genotypic Association
• Compute chi-squared tests • Allows testing of various disease models
(dominant, recessive, additivity)
SNP1 Genotype
gg gc cc
Case 100 250 150
Control 300 150 50
What is a Haplotype?
• A haplotype is the combination of one or more alleles found on the same chromosome– Person 1 has a “gc” haplotype and a “ca” haplotype– Person 2 has a “cc” haplotype and a “ga” haplotype
Compared to recreating “boutique” programs in Stata, it is relatively easy to create programs for exporting and importing data.
Acknowledgements
• Grant 1-R01DK073618-02 from the National Institute of Diabetes and Digestive and Kidney Diseases
• Grant 2006-35205-16715 from the United States Department of Agriculture.
• Drs. Loren Skow, Krista Fritz, Candice Brinkmeyer-Langford of the Texas A&M College of Veterinary Medicine
• Roger Newson of the Imperial College London
References• Barrett, J., Fry, B., Maller, J., & Daly, M. (2005). Haploview: analysis and
visualization of LD and haplotype maps. Bioinformatics, 21, 263-265.• Hartl, D.L., Jones, E.W. (1998) Genetics: Principles and Analysis, 4th Ed.
Jones & Bartlett Publishers• Stephens, M., & Donnelly, P. (2003). A Comparison of Bayesian Methods
for Haplotype Reconstruction from Population Genotype Data. American Journal of Human Genetics, 73, 1162–1169.
• Stephens, M., Smith, N. J., & Donnelly, P. (2001). A New Statistical Method for Haplotype Reconstruction from Population Data. American Journal of Human Genetics, 68, 978–989.
• Watson, J.D., Baker, T.A., Bell, S.P., Gann, A., Levine, M., Losick, R. (2004) Molecular Biology of the Gene, 5th Ed. Benjamin Cummings