Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session. Commands should be copy/pasted from this Gist Aaron Quinlan University of Utah quinlanlab.org 1 https://gist.github.com/arq5x/9e1928638397ba45da2e#file-autosomal-dominant-sh
19
Embed
Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Identifying dominant gene candidates with GEMINI
Please refer to the following Github Gist to find each command for this session. Commands should be copy/pasted from this Gist
Note: copy and paste the full command from the Github Gist
347 candidates
8
$ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ --filter "filter is NULL and impact_severity != 'LOW' and (aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)" trio.trim.vep.dominant.db \ wc -l
Use ESP and ExAC to focus on rare variants
Note: copy and paste the full command from the Github Gist
40 candidates
9
$ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ --filter "filter is NULL and impact_severity == 'HIGH' and (aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)" trio.trim.vep.dominant.db \ wc -l
Let’s be more strict with the functional consequence
Note: copy and paste the full command from the Github Gist
chrom start end ref alt gene impact cadd_raw variant_id family_id family_members family_genotypes samples family_count chr17 28778816 28778817 T C CPD stop_loss 1.09 11860 family1 1805,1847,4805 T/C,T/T,T/C 1805,4805 1 chr2 21236250 21236251 G A APOB stop_gain 7.82 492 family1 1805,1847,4805 G/A,G/G,G/A 1805,4805 1 chr22 21363743 21363744 T C TUBA3FP splice_acceptor 2.46 15272 family1 1805,1847,4805 T/C,T/T,T/C 1805,4805 1
10
Are any of these variants known to underlie a clinical phenotype?
chrom start end ref alt gene impact cadd_raw variant_id family_id family_members family_genotypes samples family_count chr17 28778816 28778817 T C CPD stop_loss 1.09 11860 family1 1805,1847,4805 T/C,T/T,T/C 1805,4805 1 chr2 21236250 21236251 G A APOB stop_gain 7.82 492 family1 1805,1847,4805 G/A,G/G,G/A 1805,4805 1 chr22 21363743 21363744 T C TUBA3FP splice_acceptor 2.46 15272 family1 1805,1847,4805 T/C,T/T,T/C 1805,4805 1
How could you extend the query from the previous slide to answer this? Hint: use the gemini documentation
11
Custom analyses: Use the query module to
identify autosomal dominant variants
12
Using the -‐-‐gt-‐filter
We need to use the query module to enforce an autosomal dominant inheritance pattern.
13
Using the -‐-‐gt-‐filter
Note: copy and paste the full command from the Github Gist
$ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, (gts).(*) \ FROM variants" \ --header \ --gt-filter "gt_types.4805 == HET \ and gt_types.1805 == HET \ and gt_types.1847 == HOM_REF" \ trio.trim.vep.dominant.db \ | head \ | column -t
chrom start end ref alt gene impact gts.1805 gts.1847 gts.4805 chr2 229933 229934 A G SH3YL1 intron A/G A/A A/G chr2 277249 277250 G A ACP1 downstream G/A G/G G/A chr2 675830 675831 G T TMEM18 intron G/T G/G G/T chr2 905441 905442 A T AC113607.2 upstream A/T A/A A/T chr2 1636903 1636904 C T PXDN UTR_3_prime C/T C/C C/T chr2 1642789 1642790 T C PXDN intron T/C T/T T/C chr2 3653841 3653842 C A COLEC11 intron C/A C/C C/A chr2 3660868 3660869 G A COLEC11 intron G/A G/G G/A chr2 3661001 3661002 G A COLEC11 intron G/A G/G G/A
14
What about large pedigrees or multiple families? --gt-‐filter “wildcards”
15
Using the --gt-‐filter “wildcards”
Note: copy and paste the full command from the Github Gist
Affected individuals must be HETUnffected individuals must be HOM_REF
chrom start end ref alt gene impact gts.1805 gts.1847 gts.4805 chr2 229933 229934 A G SH3YL1 intron A/G A/A A/G chr2 277249 277250 G A ACP1 downstream G/A G/G G/A chr2 675830 675831 G T TMEM18 intron G/T G/G G/T chr2 905441 905442 A T AC113607.2 upstream A/T A/A A/T chr2 1636903 1636904 C T PXDN UTR_3_prime C/T C/C C/T chr2 1642789 1642790 T C PXDN intron T/C T/T T/C chr2 3653841 3653842 C A COLEC11 intron C/A C/C C/A chr2 3660868 3660869 G A COLEC11 intron G/A G/G G/A chr2 3661001 3661002 G A COLEC11 intron G/A G/G G/A16
Can apply multiple --gt-‐filter “wildcards”
Note: copy and paste the full command from the Github Gist
$ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, \ (gts).(*), (gt_depths).(*) \ FROM variants" \ --header \ --gt-filter "(gt_types).(phenotype==2).(==HET).(all) \ and (gt_types).(phenotype==1).(==HOM_REF).(all) \ and (gt_depths).(*).(>=20).(all)" \ trio.trim.vep.dominant.db \ | head \ | column -t
Affected individuals must be HETUnaffected individuals must be HOM_REF
chrom start end ref alt gene impact gts.1805 gts.1847 gts.4805 gt_depths.1805 gt_depths.1847 gt_depths.4805 chr2 229933 229934 A G SH3YL1 intron A/G A/A A/G 161 225 231 chr2 277249 277250 G A ACP1 downstream G/A G/G G/A 183 237 234 chr2 905441 905442 A T AC113607.2 upstream A/T A/A A/T 90 142 246 chr2 1636903 1636904 C T PXDN UTR_3_prime C/T C/C C/T 54 87 174 chr2 1642789 1642790 T C PXDN intron T/C T/T T/C 74 120 179 chr2 3653841 3653842 C A COLEC11 intron C/A C/C C/A 123 223 199 chr2 3660868 3660869 G A COLEC11 intron G/A G/G G/A 76 96 250 chr2 3661001 3661002 G A COLEC11 intron G/A G/G G/A 70 86 182 chr2 6879884 6879885 T A LINC00487 intron T/A T/T T/A 83 110 106
Everyone must have sequence depth >=20
17
We have the inheritance model, now apply the annotation filters
Note: copy and paste the full command from the Github Gist
$ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, \ (gts).(*), (gt_depths).(*) \ FROM variants \ WHERE filter is NULL and impact_severity == 'HIGH' and (aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)" \ --header \ --gt-filter "(gt_types).(phenotype==2).(==HET).(all) \ and (gt_types).(phenotype==1).(==HOM_REF).(all) \ and (gt_depths).(*).(>=20).(all)" \ trio.trim.vep.dominant.db \ | column -t
chrom start end ref alt gene impact gts.1805 gts.1847 gts.4805 gt_depths.1805 gt_depths.1847 gt_depths.4805 chr2 21236250 21236251 G A APOB stop_gain G/A G/G G/A 46 72 112 chr17 28778816 28778817 T C CPD stop_loss T/C T/T T/C 144 171 207 chr22 21363743 21363744 T C TUBA3FP splice_acceptor T/C T/T T/C 36 56 241
Note that (pleasingly) these are the same three candidates as detected with the autosomal_dominant tool.
18
Load the following files into IGV (Load from URL) and inspect your candidates