Top Banner
Course information To reach me: Barry Cohen bcohen @ cis . njit . edu GITC 4301 W 4:00-5:30 F 4:45-5:55 www.cs.njit.edu/~bcohen Web site, chat, web board, schedule my.njit.edu (no ‘www’) Guinea pig’s prerogative
23

Course information To reach me: Barry Cohen [email protected] GITC 4301 W 4:00-5:30 F 4:45-5:55 [email protected] Web site,

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Course information

• To reach me:Barry Cohen

[email protected] 4301W 4:00-5:30 F 4:45-5:55www.cs.njit.edu/~bcohen

• Web site, chat, web board, schedulemy.njit.edu (no ‘www’)

• Guinea pig’s prerogative

Page 2: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Projects

• Team projects (4 person)

• One hour presentations

• Literature review / algorithms / programs

• Sample applications

• Open problems

• Homework for practice

Page 3: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Texts

• Intro to Computational Molecular BiologySetubal/Meidanis

• Biological sequence analysisDurbin, Eddy, Krogh, Mitchison

• Recommended:Computational Methods in Molecular Biology – Salzberg/Searls/Kasif

Page 4: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Watson & Crick, 1953

http://www.nature.com/genomics/human/watson-crick/

Page 5: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Stylized double helix

Page 6: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Replication

• ‘It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.’

Page 7: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Sequence to structure

Page 8: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

The information cycle

Page 9: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

The triplet code

Page 10: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

In the beginning …

• Life began when the earth was young

• Life arose from simple chemistry(most life still is relatively simple)

• Universal common ancestor

• Common molecular machinery(oldest fossils are living fossils)

Page 11: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

What is life?• Information and metabolism• RNA world hypothesis• DNA as program file

(information coding for activity)• Replication

(information which codes for itself)• Variation, evolution

(life adapts to its environment)

Page 12: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

DNA• DNA is a polymer (sequence, string)• DNA is composed of just four kinds of

chemical units (A, C, G, T)• DNA is redundant (double helix);

A’s pair with U’s, G’s pair with T’s• Some DNA codes for RNA, proteins

(exons – expressed regions)• Some DNA is noncoding

(introns – intervening regions)• Coherent sets of DNA are genes

Page 13: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

RNA

• RNA is a also polymer (sequence, string)

• RNA is composed of just four kinds ofchemical units (A, C, G, U)

• RNA is single stranded

• Some RNA codes for proteins,some is functional (e.g., tRNA)

Page 14: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Proteins• Proteins account for most life activity and

structure • A protein is a polymer (sequence, string)• Proteins are composed of 20 kinds of

chemical units (amino acids)• Proteins fold into a specific shape,

which determines their function• Proteins are made from genetic templates

(they don’t code)

Page 15: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Evolution

• Darwin – evolution is adaption

• Nature has no aim, it is a result of random events

• Most events are DNA string edits(indels, substititions)

• Some events are on ‘higher level’ structures(e.g., chromosomes)

Page 16: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

The ‘tree of life’

• Some errors is replication divide gene poolsinto two (speciation). (Or vice versa.)

• These bifurcations give the history of life a tree-like structure

Page 17: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

rRNA universal tree of life

Page 18: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Algorithms

• An algorithm is a precise set of instructions for solving a problem (what do we mean by ‘precise’?)

• An algorithm must terminate

• Algorithms operate on data (inputs)

• Algorithms use data structures

Page 19: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Data structures

• A string is a natural mathematical model of a biological sequence

• A directed acyclic graph may represent familial descent

• A tree may represent species relations

Page 20: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Efficiency

• Bigger problems take more time and/or space (biology problems are often big)

• Harder problems take longer or more space (many biology problems are hard)

• Time (space), as a function of size, measures the complexity of an algorithm

• Many computable are problems intractable

Page 21: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Complexity classes

• Search a sorted list – log n

• Sort by comparison – n log n

• Text search – n

• Polynomial v. exponential time

• NP-complete problems

Page 22: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Probability

• Base molecular events in evolution occur with a certain probability (frequency)

• Probability models predict what may occur (likelihood of a pair of jacks)

• Probability models may also infer what most likely has occurred

Page 23: Course information To reach me: Barry Cohen bcohen@cis.njit.edu GITC 4301 W 4:00-5:30 F 4:45-5:55 bcohenbcohen@cis.njit.edu Web site,

Entropy

• Entropy is a measure of information content

• How many y/n questions are needed to get an answer?

• DNA positions differ in entropy, depending on how ‘conserved’ they are