Top Banner
Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets Max Libbrecht
21

Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Sep 15, 2018

Download

Documents

doannhi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze

genomics data sets

Max Libbrecht

Page 2: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Genomedata Segway Segtools

Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets

Page 3: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Genomedata Segway Segtools

Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets

Page 4: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools
Page 5: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Installing Genomedata# HDF5# Ubuntu/Debian:sudo apt-get install libhdf5-serial-dev hdf5-tools# CentOS/RHEL/Fedora:sudo yum -y install hdf5 hdf5-devel# OpenSUSE:sudo zypper in hdf5 hdf5-devel libhdf5

# Pytablespip install numpypip install numexprpip install cython

# Genomedatapip install genomedata

Page 6: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Loading data into genomedata

genomedata-load-assembly --sizes my_genomedata hg19.sizesgenomedata-open-data my_genomedata my_tracknamezcat input.bedgraph.gz | genomedata-load-data my_genomedata my_tracknamegenomedata-close-data my_genomedata

hg19.sizes:chr1 249250621chr2 243199373chr3 198022430chr4 191154276

Page 7: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Accessing data: command line

$ genomedata-query my_genomedata my_trackname chr1 1000000 1000100fixedStep chrom=chr1 start=10000000.00.00.00.00.00.0...

Page 8: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Accessing data: Python

>>> import genomedata>>> g = genomedata.Genome(“my_genomedata”)>>> g[“chr1”][1000000:1000100, “my_trackname”]array([ 17.89999962, 17.89999962, 17.89999962, 17.89999962, 17.89999962, 17.89999962, 17.89999962, 17.89999962, 17.89999962, 17.89999962], dtype=float32)

Page 9: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Genomedata Segway Segtools

Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets

Page 10: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

DNase1

H3k36me3

RNA-seq

Annotation1 2 3 2 3 2 000

HMMSeg: Day et al. Bioinformatics, 2007ChromHMM: Ernst, J. and Kellis, M. Nature Biotechnology, 2010

Segway: Hoffman, M et al. Nature Methods, 2012

Semi-automated genome annotation algorithms partition and label the genome on the basis of

functional genomics tracks

Page 11: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Segment label

RNA-seq

H3K27me3

DNase1

hidden random variable

observed random variable11

Semi-automated genome annotation algorithms use dynamic Bayesian network models

Page 12: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Installing Segway# GMTKwget http://melodi.ee.washington.edu/downloads/gmtk/gmtk-1.4.0.tar.gztar -xzvf gmtk-1.4.0.tar.gz./configuremakemake installcd ..

# Segwaypip install segway

Page 13: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Running Segway

segway train my_genomedata my_traindirsegway identify my_genomedata my_traindir my_identifydir

output: my_identifydir/segway.bed.gz

Page 14: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Model parameters

Number of annotation labels--num-labels=25

Number of EM intializations--num-instances=10

Maximum number of EM training iterations--max-train-rounds=100

Page 15: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Input dataInput tracks--track=GM12878_H3K27ac --track=GM12878_H3K4me3 OR --tracks-from=tracks.txt

tracks.txt: GM12878_H3K27ac GM12878_H3K4me3 Genome coordinates--include-coords=coords.bed

coords.bed: chr1    151158060    151658060 chr10    55483812    55983812

--exclude-coords=blacklist.bed

Training minibatch size--minibatch-fraction=0.01

Page 16: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Controlling segment lengths

Downsampling resolution--resolution=10

Long segments prior--prior-strength=1.0

Weight on transition part of the model--segtransition-weight-scale=10

Page 17: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Genomedata Segway Segtools

Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets

Page 18: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

Installing Segtools

pip install segtools

Page 19: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

segtools-signal-distribution measures relationships between annotation labels and signal tracks

H4K20me1H3K79me2H3K36me3

H3K4me3H3K27ac

H3K4me1H3K9me1

H3K27me3

Page 20: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

segtools-length-distribution measures segment lengths genome coverage

segtools-length-distribution segway.bed.gz

Page 21: Genomedata, Segway and Segtools: How to use the Segway ... · Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets. Installing Segtools

segtools-aggregation measures associations with other genome annotations

segtools-aggregation --normalize --mode=gene segway.bed.gz gencode.gff