Dr Aron Fazekas - Plant DNA Barcoding; data workflow
Post on 01-Dec-2014
1948 Views
Preview:
DESCRIPTION
Transcript
Plant DNA Barcoding: data workflow
Aron Fazekas University of Guelph
Plant DNA Barcoding: data workflow
Workflow Outline:
raw sequence editing
data alignment
re-edit the sequence file
upload to BOLD
quality checks using BOLD / genbank
Sequence editing: primer trimming
5’ GTTATGCATGAACGTAATGCTC
GAGCATTACGT….
Sequence editing: primer trimming
Sequence editing: primer trimming
Sequence editing: editing miscalls
Sequence editing: congruence between forward/ reverse reads
Sequence Alignment
rbcL easy to align - most programs work wellmatK tricky to align – TransAlign seems to do the best job
trnH difficult (impossible between genera?)ITS difficult (impossible between genera?)
After editing: need to align the dataKelchner (2000) Ann Missouri Bot
Gard
Clustal www.clustal.orgTransAlign http://www.biomedcentral.com/1471-2105/6/156K-Align http://www.ebi.ac.uk/Tools/msa/kalign/
Problems to look for after alignment:
- primers not trimmed
- gaps at the ends
- gaps in the middle (protein coding)
- translation shows stop codons
Sequence Alignment
- primers not trimmed trnH-psbAReal data submitted for publication
- gaps at the ends
rbcLdata submitted for publication - gaps in the middle of a
coding region
Translate coding regions (rbcL, matK) to ensure there are no stop codons present
Edit both the alignment file and the original sequence file
Can trnH-psbA (or other non-coding sequence) be aligned across diverse species?
Upload to BOLD
After data is edited, aligned: use BOLD to create a tree
• Check for misplaced taxa – remove them from the dataset
• Check for singleton species – make a list
BOLD BLAST check
Genbank BLAST check
Genbank BLAST check
Genbank Blast
Acknowledgements
Sujeevan Ratnasingham & Bold Team
top related