Systems-Based Approaches at the Frontiers of Chemical Engineering and Computational Biology: Advances and Challenges Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and Computational Mathematics Department of Operations Research and Financial Engineering Center for Quantitative Biology
88
Embed
Systems-Based Approaches at the Frontiers of Chemical ... Floudas simplified 02.pdf · Frontiers of Chemical Engineering and Computational Biology: ... Ignacio E. Grossmann ... •
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Systems-Based Approaches at the Frontiers of Chemical Engineering
and Computational Biology:Advances and Challenges
Christodoulos A. Floudas Princeton University
Department of Chemical EngineeringProgram of Applied and Computational Mathematics
Department of Operations Research and Financial EngineeringCenter for Quantitative Biology
Outline
• Theme: Scientific/Personal Journey• Research Philosophy at CASL (Computer
Aided Systems Laboratory, Princeton University)• Research Areas: Advances & Challenges• Acknowledgements
Greece
Ioannina, Greece
Ioannina,Greece(Courtesy of G. Floudas)
Thessaloniki, Greece
Aristotle University of Thessaloniki
Department of Chemical Engineering
Aristotle University of ThessalonikiUndergraduate Studies(1977-1982)
Q1: What is the optimal topology? Binary TermsQ2: Which plants exist? Binary Variables
Objective: Minimize Overall Cost- Plant construction and operating costs
- Pipeline construction and operating cost
Binary Variables- ya
s,e: Existence of stream connecting source s to exit stream e.
- ybt,e: Existence of stream connecting plant t to exit stream e.
- yct,t’: Existence of directed stream connecting plant t to plant t’.
- yds,t: Existence of stream connecting source s to plant t.
- yet: Existence of plant t.
Formulation of Generalized Pooling Problem
Problem Characteristics- Mixed integer bilinear programming problem with bilinearities involving
pairs of continuous variables, (b,f) and (c,f) and (d,f).- Nonconvex mass balance constraints on the species include bilinear
terms.- Industrial case study: |C| = 3, |E| = 1, |S| = 7, |T| = 10.- Number of nonconvex equality constraints: |C| x (|T| + |E|). (33)
- Number of bilinear terms: |C| x |T| x (|E| + |S| + 2|T| - 2). (780)- Complex network structure with numerous feasible yet nonoptimal
possibilities.- Number of binary variables: |T| x (|E| + |S| + |T|) + |S| x |E|. (187)- Fixing the y variables, the problem is a nonconvex bilinear NLP.
Feasible Solutions
S2
S1
S3
S4
S5
S6
S7
E1
T2
T3
T7
T9
Objective function value: 1.132e6
S2
S1
S3
S4
S5
S6
S7
E1
T3
T7
T9
T10
Objective function value: 1.198e6Objective function value: 1.086e6
S2
S1
S3
S4
S5
S6
S7
E1
Objective function value: 1.620e6
1T1
3T3
T7
T9
S2
S1
S3
S4
S5
S6
S7
E1
T3
T7
T9
T10
Industrial Case StudyComponents: 3 Best known solution: 1.086 x 106
Sources: 7 Lower bound on solution: 1.070 x 106
Exit streams: 1 Absolute Gap: 0.016 x 106
Potential plants: 10 Relative Gap: 1.5 %
1.0705948611706139766Bin RLT N = 7
1.0518580010338127766Bin RLT N = 6
1.03176178970115766Bin RLT N = 5
1.02236727602103766Bin RLT N = 4
1.005816623491766Bin RLT N = 3
0.977519486679766Bin RLT N = 2
Subnetwork {t3, t7, t9, t10}
0.7433621193211873850RLT
0.550583544187987Bilinear Terms
1.0862.5424187207Nonconvex
Obj (106)CPU (s)Constr.{0,1} varℜ var.Formulation
Challenges/OpportunitiesCurrent Status: Great Success for Theory & Algorithm for Small to Medium-size Applications
• Improved Convex Underestimation Methods• Now Theoretical Results on Convex
Envelopes• Medium to Large-scale C2-NLPs
– Pooling Problems• Medium to Large-scale MINLPs
– Product & Process Design/Synthesis/Operations– Signal Transduction/Metabolic Pathways– Generalized Pooling Problems
• New Theory and Algorithms for DAE Models• New Theory and Algorithms for Grey/Black
box models• Multi-level Nonlinear Optimization
Computational Biology & Genomics(1990-)
Christodoulos A. Floudas Princeton University
Computational Biology & GenomicsHow/When did we start?
Motivation: Multiple local minima in- Lennard-Jones Cluster packing- Structure prediction of small molecules- Structure prediction of oligo-peptides- Structure prediction in protein folding
Initial Studies: 1990-95
Computational Biology and Genomics• Structure Prediction in Lennard-Jones Clusters & Acyclic Molecules (90-95)
• Structure Prediction in Protein Folding (95-)
• Dynamics in Protein Folding (96-00)
• Force Field Development (01-)
• De Novo Protein Design (01-)
• Protein-Peptide Interactions (95-03)
• Metabolic and Signal Transduction Networks (95-)
• Proteomics: Peptide & Protein Identification (05-)
Beta strand and sheet structureMHRTSNGSHATGGNLPDVASHYPVAYEQTLDGTVGFVIDEMTPERATASVEVTDTLRQRWGLVHGGAYCALAEMLATEATVAVVHEKGMMAVGQSNHTSFFRPVKEGHVRAEAVRIHAGSTTWFWDVSLRDDAGRLCAVSSMSIAVRPRRD
• Prediction of Disulfide Bridges• Force-field development for Fold Recognition• New/Improved Methods for Threading/Fold Recognition• Uncertainty in Force-fields• Packing of Helices in Globular Proteins
• Prediction of Tertiary Interhelical Contacts in α and α/β proteins• Helical Membrane Proteins (e.g. GPCRs)
• Improved prediction of Helical Sequences• Loop predictions • Packing of Helices in Lipid Bilayers• 3-D structure prediction
De Novo Protein DesignDefine target template
Human β-Defensin-2hbd-2 (PDB: 1fqq)
Full sequence designMayo et al.; Hellinga et al.; DeGrado et al;
Saven et al.; Hecht et al.
Design folded protein
ChallengesIn silico sequence selection
Fold specificity
Backbone coordinates for N,Ca,C,Oand possibly Ca-Cb vectors from PDB
Which amino acid sequences willstabilize this target structure ?
Combinatorial complexity-Backbone length : n-Amino acids per position : mmn possible sequences
De Novo Protein DesignDefine target template
Human β-Defensin-2hbd-2 (PDB: 1fqq)
Full sequence designMayo et al.; Hellinga et al.; DeGrado et al;
Saven et al.; Hecht et al.
Design folded protein
ChallengesIn silico sequence selection
Fold validation/specificity
Backbone coordinates for N,Ca,C,Oand possibly Ca-Cb vectors from PDB
Which amino acid sequences willstabilize this target structure ?
Combinatorial complexity-Backbone length : n-Amino acids per position : mmn possible sequences
De Novo Protein DesignStructure to Function
Enhance Structural Stability
Enhance Functionality
Combinatorial complexity• Backbone length : n• Amino acids per position : m
Multiplicity of sequences• How to determine most stable ?• How to determine most functional ?
mn possible sequences
De Novo Protein Design Framework: AdvancesKlepeis, Floudas,Lambris & Morikis,JACS(2003); Klepeis et al., IECR(2004)Loose, Klepeis, Floudas, PROTEINS (2004); Fung, Rao, Floudas et al., JOCO (2005)Fung, Taylor, Floudas et al., OMS (2006)
Sequence selection• Identify target template for desired fold;specify coordinates of backbone
• Identify possible residue mutations• Introduce distance dependentpairwise potential based on Ca
• Generate rank-ordered energeticlist from mixed-integer linear (MILP)
Fold Validation via Astro-Fold• Model selected sequences using flexible, detailed energetics• Employ global optimization for free system• Employ global optimization for system constrained to template
• Calculate relative probability for structures similar to desired fold
CompstatinPotent inhibitor of third component of complement
Structural features• Cyclic, 13 residues• Disulfide Bridge Cys2-Cys12• Central beta-turn
Gln5-Asp6-Trp7-Gly8• Hydrophobic core• Acetylated form displayshigher inhibitory activity
Functional features• Binds to and inactivatesthird component of complement
• Structure of bound complex notyet available
with Dr. John Lambris(Univ. of Pennsylvania)and Dr. Dimitri Morikis(Univ. of California, Riverside)
Ac-compstatin
In Silico De Novo Design
Analog Ac-V4Y/H9AAnalog Ac-W4Y/H9A
Klepeis, Floudas, Morikis, Tsokos, Argyropoulos, Spruce, Lambris (2003) J. American Chemical Society.Klepeis, Floudas, Morikis, Lambris (2004) Ind. & Eng. Chem. Res.Fung, Rao, Floudas (2005); Fung, Taylor, Floudas (2006)
x7 x16x45
Challenges and Opportunities• Improved Methods for In Silico Sequence Selection with flexible templates from 2013
to 2050 to 20100
• Improved Force-field development for De Novo Protein Design
• Simultaneous Sequence and Structure Selection
• Design of Peptidic Inhibitors for Complement 3• Design of novel human β-defensin• Discovery of novel GPCRs• De Novo Design of Medium-size Proteins• Map Sequences to Known Folds
Proteomics: Peptide and Protein Identificationvia Tandem Mass Spectroscopy
LKYVI STCMYAR DILNG
GGAWKLK ILFAD
MS-MS spectra
A B C
Peptide Mixture Peptide Identifications
Protein sample Protein identifications
Protein level
Enzymaticdigestion
Peptide level
Mixture separationMS-MS sequencing
MS-MS spectra level
ValidationDatabase search
Experimental C
ompu
tatio
nal
Validation
Peptidegrouping
?
Peptide & Protein Identification via Tandem MS
• Database-based methods• Correlate the experimental spectra with spectra of peptides/proteins which exist in the databases
• De Novo Methods• Predict peptides without sequence databases• Exhaustive listing; sub-sequencing; graphical• Graph theory and shortest path algorithms• Graph theory and dynamic programming• Bayesian scoring of random peptides
Key ideaUtilization of binary variablesbinary variables to model logical decisions: 1 = yes; 0 = no
Paths between peaks (wij)Selection of peaks (pi)
Novel ConceptNovel Concept: use of mixed-integer linear optimization (MILPMILP) to
solve the peptide sequencing problem
De Novo Framework: De Novo Framework: PILOTPILOT
Peptide identification via Integer Linear Optimization and Tandem mass spectrometry
Challenges and Opportunities• Develop a De Novo computational approach based on a novel Mixed-Integer Linear Optimization (MILP) framework for the peptide identification using only information of the ion peaks in the spectrum
• Develop a hybrid method in combination with database methods
• Develop a novel approach which will account for experiment uncertainty
• Develop computational methods for protein identification
• Develop approaches for predicting protein-protein interactions in a complex mixture of proteins using tandem MS/MS and protein cross-linking technology
Process Operations: Scheduling & Planning(1996-)
Christodoulos A. Floudas Princeton University
Process Operations: Scheduling & PlanningHow/When did we start?
Suggestion of Prof. R.W.H. Sargent, Imperial College, Fall 1992.
Motivation: Are Continuous-Time Formulations Effective for Short-Term Scheduling?
Initial Studies: 1996-98
Process and Product Operations: Scheduling, Planning & Uncertainty
– Production in terms of task sequences– Pieces of equipment and their ranges of capacities– Intermediate storage capacity– Production requirement– Time horizon under consideration
• Determine:– Optimal sequence of tasks taking place in each unit– Amount of material processed at each time in each unit– Processing time of each task in each unit
• so as to optimize a performance criterion,– Maximization of production, minimization of makespan, etc.
Process Operations: Scheduling - Advances
• From Discrete-Time to Continuous-TimeScheduling Approaches
– Significant reduction of binary variables(combinatorial complexity)
– Rolling horizon approaches– Decomposition methods
• Periodic scheduling
Floudas & Lin, (2004a): C&ChE; Floudas & Lin (2005): Annals of OR
Global Event Based Models Unit-Specific Event Based Models
Short-Term, Medium-Term and Reactive Scheduling of an Industrial
Polymer Compounding Plant
Plant Data Description• Over 80 different products considered in time horizon
(250 overall)• Over 85 orders in nominal schedule and over 65 orders
added in reactive schedule• Basic operations: reaction, filtering, storage, filling• Units: reactors, filters, prill tower, swing and product
tanks, filling stations – (85 units)• Scheduling horizon: ~ 2 weeks• Storage limitations on reactors and tanks• Campaign mode production for prill tower and associated
units• Additional considerations:
– Clean-up times for each unit switching between tasks– Demands with intermediate due dates– Different types of final products
Process Alternatives: Polymer Compounding Plant
State-Task Network (STN) Representation
F Type 1 I1 Type 6 P
F Type 1 I1 Type 4a P Type 6 P
F Type 1 I1 Type 4b P Type 6 P
F Type 1 I1 PType 4b Type 6 PType 3 I2
F Type 2 I1 Type 4a P Type 6 P
F Type 1 I1 PType 5 Type 6 PType 4a I2
F Type 2 I1 Type 4a I2 Type 4a P Type 6 P
Mathematical Framework• Decompose the large and complex problem for a
long time period into smaller short-term scheduling sub-problems in successive time horizons.
• Decomposition determines each time horizon as well as the products to include based on:– Number of products with demands– Complexity of corresponding process recipes– Resulting computational complexity
• Connection between consecutive time horizons:– Available starting time of units– Available intermediate materials
Industrial Polymer Compounding Plant: Case Study 2
• Campaign Mode Production determined first• Nominal and Reactive Scheduling performed• Time horizons considered: 18 days• Constraint to limit lateness of orders to be <= 24 hours
Dr. J. Kallrath, BASF Dr. A. Schreieck, BASF Stacy Janak
Case 2: Nominal: Process Units (18 days)
Case 2: Nominal: Storage Units (18 days)
Case 2: Results Summary (18 days)
41.14
20.33
% Increase
35.41
20.49
% Increase
242128.48
171556.64
244006.16
202774.07
Profit Value
1504.47
1848.11
Extra Time (hr.)
2264.578Reactive Demand
3066.419Reactive Prod.
2799.119Nominal Prod.
Production (tons)
2323.191Nominal Demand
• In both cases, production is increased significantly compared to the required demand. The value of the profit also increased compared to the value of the required demand.
• In the reactive schedule, the overall demand has decreased compared to the nominal schedule, but the production and profit have increased.
• The extra time is the total time available for additional production in all the reactors where blocks of time must be 11 hours or greater.
Challenges/Opportunities• Modeling to reduce/close the integrality gap
• New/Improved Methods for Medium-term scheduling
• Multi-site production scheduling
• Reactive Scheduling
• Scheduling under Uncertainty
• Design/Synthesis and Scheduling under Uncertainty
• Planning and Scheduling
• Planning under Uncertainty
• Validation/Application to Manufacturing Operations
Acknowledgements
Aristotle University of ThessalonikiUndergraduate Studies