Top Banner
Practice retrieving data and running stand alone BLAST. Step 1. Identify genes in the ABA biosynthesis pathway from the Arabidopsis Cyc database http://www.arabidopsis.org/biocyc/index.jsp Step 2. Identify subject database Vitis vinifera (nucleotide) Solanum pennellii (EST)
34

Practice retrieving data and running stand alone BLAST. 

Jan 01, 2016

Download

Documents

zeus-alston

Practice retrieving data and running stand alone BLAST.   Step 1. Identify genes in the ABA biosynthesis pathway from the Arabidopsis Cyc database http://www.arabidopsis.org/biocyc/index.jsp Step 2. Identify subject database Vitis vinifera (nucleotide) Solanum pennellii (EST). - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Practice retrieving data and running stand alone BLAST. 

Practice retrieving data and running stand alone BLAST. 

 Step 1. Identify genes in the ABA biosynthesis pathway from the Arabidopsis Cyc database

http://www.arabidopsis.org/biocyc/index.jsp

Step 2. Identify subject databaseVitis vinifera (nucleotide)Solanum pennellii (EST)

Page 2: Practice retrieving data and running stand alone BLAST. 
Page 3: Practice retrieving data and running stand alone BLAST. 

Query: Select Pathway by name Enter: Abscisic AcidSubmit

Page 4: Practice retrieving data and running stand alone BLAST. 
Page 5: Practice retrieving data and running stand alone BLAST. 
Page 6: Practice retrieving data and running stand alone BLAST. 
Page 7: Practice retrieving data and running stand alone BLAST. 

Now what?

Page 8: Practice retrieving data and running stand alone BLAST. 

abscisic acid biosynthesis abscisic acid biosynthesisAT4G18350 AT4G18350AT3G24220 AT3G24220AT1G78390 AT1G78390AT1G78390 AT3G14440AT4G18350 AT1G30100AT3G24220 AT2G27150AT3G14440 AT1G52340AT1G30100AT2G27150AT1G52340

Filter for unique sequences (EXCEL: Data, Filter, Advanced Filter…)

Page 9: Practice retrieving data and running stand alone BLAST. 

Notepad ++ EDIT, LINE OPPERATIONS, JOIN LINES SEARCH, REPLACE, “space” with “spaceORsapce” Paste into ENTREZ Nucleotide search…

Page 10: Practice retrieving data and running stand alone BLAST. 
Page 11: Practice retrieving data and running stand alone BLAST. 

PERL

chomp;next if /^\s/; #(skip if there is a space in start of the line)next if /^Gene/; #(if line starts with “gene”, skip)my @temp = split /\t/; #(data set is tab delimited)$hash{$temp[0]} = 1; #(unique sequence i.d. #0 is first element of the array)

Then invoke BioPerl to query NCBI with the search string: TAIR:AT### AND “complete cds”

Where AT### are the unique accession numbers from AraCyc and “complete cds” eliminates genomic sequence (e.g. complete Ath chrom 4)

See complete script on class site….

Page 12: Practice retrieving data and running stand alone BLAST. 
Page 13: Practice retrieving data and running stand alone BLAST. 

Do we want this much sequence?

Page 14: Practice retrieving data and running stand alone BLAST. 

Use the push pin to highlight all boxes for mRNA (22 sequences) so we don’t get chromosome 4 genomic sequences

Page 15: Practice retrieving data and running stand alone BLAST. 
Page 16: Practice retrieving data and running stand alone BLAST. 
Page 17: Practice retrieving data and running stand alone BLAST. 
Page 18: Practice retrieving data and running stand alone BLAST. 

Try: Use Unix to verify that the file contains all the sequences…

Q: What command would you use?

A: $ grep –c “>” filename

Page 19: Practice retrieving data and running stand alone BLAST. 
Page 20: Practice retrieving data and running stand alone BLAST. 
Page 21: Practice retrieving data and running stand alone BLAST. 

(lycopersicum [ORGN] AND EST) AND "Solanum pennellii"[porgn:__txid28526]

Page 22: Practice retrieving data and running stand alone BLAST. 
Page 23: Practice retrieving data and running stand alone BLAST. 
Page 24: Practice retrieving data and running stand alone BLAST. 
Page 25: Practice retrieving data and running stand alone BLAST. 
Page 26: Practice retrieving data and running stand alone BLAST. 

Try: Use Unix to verify that the file contains all the sequences…

Page 27: Practice retrieving data and running stand alone BLAST. 

Vitis [ORGN] AND ESTNucleotide

Page 28: Practice retrieving data and running stand alone BLAST. 
Page 29: Practice retrieving data and running stand alone BLAST. 

Note syntax of ENTREZ search invoked by organism tree link

Page 30: Practice retrieving data and running stand alone BLAST. 

For class, I recommend downloading the smaller Nucleotide data set…

Page 31: Practice retrieving data and running stand alone BLAST. 

Try: Use Unix to verify that the file contains all the sequences…

Page 32: Practice retrieving data and running stand alone BLAST. 

Now what?

Which file needs to be formatted for BLAST (formatdb)?

Which file will be the query file?

What is the syntax for the BLAST (including PATH)?

Page 33: Practice retrieving data and running stand alone BLAST. 
Page 34: Practice retrieving data and running stand alone BLAST. 

Formatdb

$ /path/formatdb -i /path/filename –p F

Run nucleotide BLAST (blastn)

$ /path/blastall -p blastn -d /path/filename -i /path/filename –o filename –e 0.01