Deconvoluting BAC- Deconvoluting BAC- gene Relationships gene Relationships Using Using a Physical Map a Physical Map Y. Wu Y. Wu 1 1 , L. Liu , L. Liu 1 1 , T. Close , T. Close 2 2 , , S. Lonardi S. Lonardi 1 1 1 Department of Computer Science & Department of Computer Science & Engineering Engineering 2 Department of Botany & Plant Sciences Department of Botany & Plant Sciences
39
Embed
Deconvoluting BAC-gene Relationships Using a Physical Map
Deconvoluting BAC-gene Relationships Using a Physical Map. Y. Wu 1 , L. Liu 1 , T. Close 2 , S. Lonardi 1 1 Department of Computer Science & Engineering 2 Department of Botany & Plant Sciences. Selective sequencing. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Deconvoluting BAC-gene Deconvoluting BAC-gene Relationships UsingRelationships Using
a Physical Mapa Physical MapY. WuY. Wu11, L. Liu, L. Liu11, T. Close, T. Close22, S. Lonardi, S. Lonardi11
11Department of Computer Science & EngineeringDepartment of Computer Science & Engineering22Department of Botany & Plant SciencesDepartment of Botany & Plant Sciences
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Selective sequencingSelective sequencing
• Many organisms are unlikely to be Many organisms are unlikely to be sequenced in the near future due to the sequenced in the near future due to the large size and highly repetitive content of large size and highly repetitive content of their genomestheir genomes
• Selective sequencing:Selective sequencing: obtain the sequence obtain the sequence of a small set of BAC clones that contain a of a small set of BAC clones that contain a specific set of genes of interestspecific set of genes of interest
• How do we identify these BAC clones?How do we identify these BAC clones?BAC-gene deconvolution problemBAC-gene deconvolution problem
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
An illustration of the problemAn illustration of the problem
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
An illustration of the problemAn illustration of the problem
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
An illustration of the problemAn illustration of the problem
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Hybridization with probesHybridization with probes
• The presence of a gene in a BAC can be The presence of a gene in a BAC can be determined by an hybridization experiment determined by an hybridization experiment (e.g., using a (e.g., using a uniqueunique probe designed from it) probe designed from it)
• Given that typically BAC clones and probes Given that typically BAC clones and probes could be in the order of tens of thousands, could be in the order of tens of thousands, carrying out an experiment for each pair carrying out an experiment for each pair (BAC,probe) is usually unfeasible(BAC,probe) is usually unfeasible
• Group testingGroup testing (or (or poolingpooling) has to be used) has to be used
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Hybridization with pools of probesHybridization with pools of probes
• Probes can be arranged into pools for Probes can be arranged into pools for group testing. However, in order to achieve group testing. However, in order to achieve exact deconvolution this strategy could be exact deconvolution this strategy could be still unfeasible due to the large number of still unfeasible due to the large number of poolspools
• QuestionQuestion: Can we use a small number of : Can we use a small number of pools (e.g., 1- or 2-decodable pool design) pools (e.g., 1- or 2-decodable pool design) and still achieve accurate deconvolution?and still achieve accurate deconvolution?
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Dealing with the limitations of poolingDealing with the limitations of pooling
• Answer:Answer: Yes, if one compensates for the Yes, if one compensates for the lack of information obtained by a weak lack of information obtained by a weak pooling design with the knowledge of the pooling design with the knowledge of the overlapping structure of the BACsoverlapping structure of the BACs
• In this way, the number of pools required In this way, the number of pools required is reduced is reduced less expensive/time- less expensive/time-consumingconsuming
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Hybridization dataHybridization data
h(b,p)=1 h(b,p)=1 (pool (pool pp hybridizes to BAC hybridizes to BAC bb))– b b mustmust contain at least one of the contain at least one of the
probes/genes represented by probes/genes represented by pp– positive informationpositive information
h(b,p)=0 h(b,p)=0 (pool (pool pp does not hybridize to BAC does not hybridize to BAC bb))– bb cannotcannot contain any of the probes/genes contain any of the probes/genes
represented by represented by pp– negative informationnegative information
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Deconvolution problemDeconvolution problem
• Given Given h(b,p)h(b,p) for all pairs for all pairs (b,p)(b,p) the the deconvolution problemdeconvolution problem is to establish a is to establish a one-to-many assignment between the one-to-many assignment between the probes probes pp and the clones and the clones bb in such a way in such a way that it satisfies the value of that it satisfies the value of hh
1.1. Basic deconvolution: uses only on Basic deconvolution: uses only on information obtained from group testinginformation obtained from group testing
2.2. Improved deconvolution: also uses the Improved deconvolution: also uses the physical mapphysical map
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Input to the basic deconvolutionInput to the basic deconvolution
hh pp11 pp22 pp33 pp44
bb11 11 00 00 00
bb22 11 11 00 00
bb33 00 11 11 00
bb44 00 00 11 11
bb55 00 00 00 11
Hybridization table
pi is a poolbj is a BACuk is a probe/gene
pi is a poolbj is a BACuk is a probe/gene
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Input to the basic deconvolutionInput to the basic deconvolution
hh pp11 pp22 pp33 pp44
bb11 11
bb22 11 11
bb33 11 11
bb44 11 11
bb55 11
uu11 uu22 uu33 uu44 uu55 uu66 uu77 uu88 uu99
pp11 11 11 11
pp22 11 11 11
pp33 11 11 11
pp44 11 11 11
Pool content table
Hybridization table
pi is a poolbj is a BACuk is a probe/gene
pi is a poolbj is a BACuk is a probe/gene
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Positive informationPositive information
uu11 uu22 uu33 uu44 uu55 uu66 uu77 uu88 uu99
bb11,p,p11 11 11 11
bb22,p,p11 11 11 11
bb22,p,p22 11 11 11
bb33,p,p22 11 11 11
bb33,p,p33 11 11 11
bb44,p,p33 11 11 11
bb44,p,p44 11 11 11
bb55,p,p44 11 11 11
pi is a poolbj is a BACuk is a probe/gene
pi is a poolbj is a BACuk is a probe/gene
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Negative informationNegative information
uu11 uu22 uu33 uu44 uu55 uu66 uu77 uu88 uu99
bb11 00 00 00 00 00 00 00
bb22 00 00 00 00 00
bb33 00 00 00 00 00 00
bb44 00 00 00 00 00
bb55 00 00 00 00 00 00 00pi is a poolbj is a BACuk is a probe/gene
pi is a poolbj is a BACuk is a probe/gene
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
• Each row represents Each row represents a a constraintconstraint to be to be satisfiedsatisfied
• If a row contains only If a row contains only one “1”, then the one “1”, then the relationship between relationship between the BAC and probe the BAC and probe is resolved exactlyis resolved exactly
pi is a poolbj is a BACuk is a probe/gene
pi is a poolbj is a BACuk is a probe/gene
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
• Basic deconvolution is not sufficient Basic deconvolution is not sufficient • BACs are assembled into contigs by FPC (a BACs are assembled into contigs by FPC (a
contigcontig is a set of BAC clones) is a set of BAC clones)• We assume the probes are unique We assume the probes are unique each probe each probe
can belong to exactly one contigcan belong to exactly one contig
Contig 1 Contig 2
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Optimization problemOptimization problem
• We formulate the following optimization We formulate the following optimization problemproblem
• The problem is NP-complete (proof in the The problem is NP-complete (proof in the paper, reduction from 3SAT)paper, reduction from 3SAT)
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Integer Linear ProgrammingInteger Linear Programming
• The optimization problem can be solved The optimization problem can be solved via integer linear programming (ILP)via integer linear programming (ILP)
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
LP and randomized roundingLP and randomized rounding
• The ILP is relaxed to the corresponding The ILP is relaxed to the corresponding LP, then the LP is solved exactly (via the LP, then the LP is solved exactly (via the GLPK package)GLPK package)
• Optimal solution to the LP is mapped to a Optimal solution to the LP is mapped to a valid solution to the ILP via randomized valid solution to the ILP via randomized rounding rounding
• We prove that our method achieves We prove that our method achieves approximation ratio approximation ratio (1-e(1-e-1-1))
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Experimental results on rice genomeExperimental results on rice genome
• Whole genome sequence for rice is availableWhole genome sequence for rice is available• BAC library and fingerprinting data are available BAC library and fingerprinting data are available
from AGIfrom AGI• BAC-end sequences are also available from BAC-end sequences are also available from
GenbankGenbank• Physical map was built using FPCPhysical map was built using FPC• Coordinates of the BAC on the genome were Coordinates of the BAC on the genome were
determined by BLASTing BAC-end sequences determined by BLASTing BAC-end sequences against the genomeagainst the genome
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Experimental results on rice genomeExperimental results on rice genome
• Rice unigenes are available from NCBIRice unigenes are available from NCBI
• Unique probes for the unigenes were Unique probes for the unigenes were designed by the Oligospawn softwaredesigned by the Oligospawn software
• Experiments focused on chromosome IExperiments focused on chromosome I
• Probe pools were designed following the Probe pools were designed following the shifted transversal design (STD)shifted transversal design (STD)
• Dataset: 2,002 probes and 2,629 BACsDataset: 2,002 probes and 2,629 BACs
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Experimental resultsExperimental results
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
FindingsFindings
• We proposed a new method to solve the We proposed a new method to solve the BAC-gene deconvolution problem based BAC-gene deconvolution problem based on integer linear programmingon integer linear programming
• Experimental results show that our method Experimental results show that our method is accurate and effectiveis accurate and effective
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Thank youThank you
• FundingFunding
• Serdar Bozdag (UC Riverside) for providing the Serdar Bozdag (UC Riverside) for providing the rice data (fingerprinting and hybridization)rice data (fingerprinting and hybridization)
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Hybridization with pools of probesHybridization with pools of probes
• Probes can be arranged into pools for Probes can be arranged into pools for group testinggroup testing
• In order to achieve exact deconvolution In order to achieve exact deconvolution this strategy can be still unfeasiblethis strategy can be still unfeasible
• The reason: a BAC may contain several, if The reason: a BAC may contain several, if not tens of genes not tens of genes the “decodability” of the “decodability” of the pool design has to be high to achieve the pool design has to be high to achieve exact deconvolution exact deconvolution … …
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Hybridization with pools of probesHybridization with pools of probes
• … … the pool size has to be small, which the pool size has to be small, which implies that the number of pools will be implies that the number of pools will be largelarge
• QuestionQuestion: Can we use a low decodability : Can we use a low decodability (1- or 2-decodable) pool design and still (1- or 2-decodable) pool design and still achieve good deconvolution?achieve good deconvolution?
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
• For example, if we knew that BAC For example, if we knew that BAC bi and and BAC BAC bbjj are 80% overlapping are 80% overlapping if a probe if a probe pp belongs to BAC belongs to BAC bi, it is very likely that , it is very likely that pp also belongs to also belongs to bbjj
• On the other hand, if we knew that BAC On the other hand, if we knew that BAC bi and BAC and BAC bbjj are not overlapping are not overlapping if a if a probe probe pp belongs to BAC belongs to BAC bi, then it is very , then it is very unlikely that probe unlikely that probe pp also belong to BAC also belong to BAC bbjj
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
• Basic deconvolution step is not sufficientBasic deconvolution step is not sufficient• The overlapping structure of the BACs is The overlapping structure of the BACs is
used to resolve additional relationships used to resolve additional relationships between BACs and probesbetween BACs and probes
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Sketch of the algorithmSketch of the algorithm
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Perfect physical mapPerfect physical map
• Cut the chromosome at the points where a BAC Cut the chromosome at the points where a BAC starts or endsstarts or ends
• Let’s call the resulting pieces Let’s call the resulting pieces fragmentsfragments• Each fragment is covered by a set of BACsEach fragment is covered by a set of BACs• Assume the probes are unique, therefore, each Assume the probes are unique, therefore, each
probe can only belong to one fragmentprobe can only belong to one fragment
f1 f2 f3 f4 f5
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Optimization problemOptimization problem
• Optimization problem is similarly formulatedOptimization problem is similarly formulated
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
ILPILP
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Solving the optimization problemSolving the optimization problem
• The above problem is The above problem is NP-completeNP-complete
• It is solved via ILP followed by It is solved via ILP followed by LP relaxation and randomized LP relaxation and randomized roundingrounding
• Similar performance guarantee Similar performance guarantee can be provedcan be proved
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007
Sketch of the algorithmSketch of the algorithm
Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007