3/7/2003 3/7/2003 Bioinformatics Bioinformatics 1 How To Address Rapidly Changing How To Address Rapidly Changing Data Representations in an Data Representations in an Evolving Scientific Domain Evolving Scientific Domain Using Aspect-oriented Using Aspect-oriented Programming Techniques + Programming Techniques + Overview of Bioinformatics at Overview of Bioinformatics at NEU. NEU. Karl Lieberherr Karl Lieberherr ([email protected]) ([email protected]) College of Computer and College of Computer and Information Science Information Science Northeastern University Northeastern University Boston Boston
47
Embed
3/7/2003Bioinformatics1 How To Address Rapidly Changing Data Representations in an Evolving Scientific Domain Using Aspect-oriented Programming Techniques.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
3/7/20033/7/2003 BioinformaticsBioinformatics 11
How To Address Rapidly Changing Data How To Address Rapidly Changing Data Representations in an Evolving Scientific Representations in an Evolving Scientific
Domain Using Aspect-oriented Domain Using Aspect-oriented Programming Techniques +Programming Techniques +
Overview of Bioinformatics at NEU.Overview of Bioinformatics at NEU.
College of Computer and Information College of Computer and Information ScienceScience
Northeastern UniversityNortheastern University
BostonBoston
3/7/20033/7/2003 BioinformaticsBioinformatics 22
MotivationMotivation
From: Computational Challenges in From: Computational Challenges in Structural and Functional Genomics by J. Structural and Functional Genomics by J. Head-Gordon, Head-Gordon, IBM SYSTEMS JOURNAL, VOL 40, NO 2, 2001.
3/7/20033/7/2003 BioinformaticsBioinformatics 33
Some Quotes From Head-Some Quotes From Head-Gordon.Gordon.
Although techniques for warehousing techniques are as vital in the sciences as in business, functional warehouses tailored for specific scientific needs are few and far between.
A key technical reason for this discrepancy is that our understanding of the concepts being explored in an evolving scientific domain change constantly, leading to rapid changes in data representation.
3/7/20033/7/2003 BioinformaticsBioinformatics 44
Some Quotes From Head-Some Quotes From Head-Gordon (Refinement).Gordon (Refinement).
… evolving scientific domain change constantly, leading to rapid changes in data representation.
Not only changes in data representation but also changes in interfaces – need protection against changes in interfaces.
Examples: additional or modified fields or arguments; additional or modified types.
3/7/20033/7/2003 BioinformaticsBioinformatics 55
More Quotes From Head-More Quotes From Head-Gordon.Gordon.
When the format of source data changes, the warehouse must be updated to read that source or it will not function properly. The bulk of these modifications involve extremely tedious, low-level translation and integration tasks that typically require the full attention of both database and domain experts. Given the lack of the ability to automate this work, warehouse maintenance costs are prohibitive, and warehouse “up-times” severely restricted.
3/7/20033/7/2003 BioinformaticsBioinformatics 66
Protect Against Changes.Protect Against Changes.
Protection against changes in data representation and interfaces. Protection against changes in data representation and interfaces. Traditional technique: information-hiding is good to protect Traditional technique: information-hiding is good to protect against changes in data representation. Does not help with against changes in data representation. Does not help with changes to interfaces.changes to interfaces.
Need more than information hiding to protect against interface Need more than information hiding to protect against interface changes: restriction through shy programming, called Adaptive changes: restriction through shy programming, called Adaptive Programming (AP).Programming (AP).
Implementation Interface Client
Information Hiding Shy Programming
3/7/20033/7/2003 BioinformaticsBioinformatics 77
Problem with Information HidingProblem with Information Hiding
Shy Programming builds on the observation that Shy Programming builds on the observation that traditional black-box composition is not traditional black-box composition is not restricting enough. We use the slogan: restricting enough. We use the slogan: information hiding is not hiding enough. information hiding is not hiding enough. Blackbox composition Blackbox composition isolates the isolates the implementation from the interfaceimplementation from the interface, but , but does not does not decouple the interface from its clients.decouple the interface from its clients.
3/7/20033/7/2003 BioinformaticsBioinformatics 88
Cover unimportant parts of the Cover unimportant parts of the interfaceinterface
To permit interfaces to evolve, self-discipline is To permit interfaces to evolve, self-discipline is required to prevent from programming required to prevent from programming extensively against the interface. Certain parts of extensively against the interface. Certain parts of the interface are best left as if they were the interface are best left as if they were covered. covered.
This disciplined programming is referred to as This disciplined programming is referred to as shy programming. Shy programming lets the shy programming. Shy programming lets the program recover from (or adapt to) interface program recover from (or adapt to) interface changes. Shy programming is also called changes. Shy programming is also called Adaptive Programming (AP). This is similar to Adaptive Programming (AP). This is similar to the shyness metaphor in the Law of Demeter the shyness metaphor in the Law of Demeter (LoD): structure evolves over time, thus (LoD): structure evolves over time, thus communicate with just a subset of the visible communicate with just a subset of the visible objects. objects.
We summarize the commonalities and differences We summarize the commonalities and differences between black-box composition and Shy Programming between black-box composition and Shy Programming into two principles.into two principles.– Black-box PrincipleBlack-box Principle: the representation of objects can be : the representation of objects can be
changed without affecting clients.changed without affecting clients.
– Shy-Programming PrincipleShy-Programming Principle: the interface of objects can be : the interface of objects can be changed within certain parameters without affecting clients.changed within certain parameters without affecting clients.
It is important to notice that the Shy-Programming It is important to notice that the Shy-Programming Principle builds on top of the Black-Box principle.Principle builds on top of the Black-Box principle.
A manager M is managing a set of group leaders A manager M is managing a set of group leaders G, each one managing a set of workers W. We G, each one managing a set of workers W. We consider issues related to informing M and consider issues related to informing M and requesting information from M. We use this requesting information from M. We use this example to illustrate three points.example to illustrate three points.– MicromanagerMicromanager – no information restriction. – no information restriction.– ShynessShyness – helps information restriction. – helps information restriction.– Complex requestsComplex requests – help information restriction and – help information restriction and
optimization.optimization.
Want to learn about organizing bioinformatics knowledge.M
MicromanagerMicromanager – no information restriction. – no information restriction.– If the manager is a micromanager (a manager that If the manager is a micromanager (a manager that
wants to know about and rely on all the details of the wants to know about and rely on all the details of the worker’s projects), the managing approach is worker’s projects), the managing approach is brittlebrittle because when there is a change in the details of one because when there is a change in the details of one of the worker’s projects, the manager needs to be of the worker’s projects, the manager needs to be notified.notified. M
MicromanagerMicromanager – no information restriction (continued). – no information restriction (continued).– An object-oriented program written in the usual way An object-oriented program written in the usual way
corresponds to the manager that likes to micromanage. It is corresponds to the manager that likes to micromanage. It is full of detailed knowledge of the class graph. An alternative full of detailed knowledge of the class graph. An alternative way of formulating the same idea is to observe that it is good way of formulating the same idea is to observe that it is good when the workers are shy. A shy worker will when the workers are shy. A shy worker will only share only share minimal, high-level information with the group leaderminimal, high-level information with the group leader. And . And this will prevent a brittle situation where the group leaders this will prevent a brittle situation where the group leaders and manager rely on too much detail.and manager rely on too much detail.
ShynessShyness – helps information restriction – helps information restriction – It is good for the workers to be It is good for the workers to be shyshy and only talk to their and only talk to their
group leader and not to the manager directly. (group leader and not to the manager directly. (ShynessShyness has has twotwo facets: talk only to a facets: talk only to a fewfew friendsfriends AND share AND share minimalminimal information with them. Here we use the first facet while in the information with them. Here we use the first facet while in the previous point we used the second facet.) The group leader previous point we used the second facet.) The group leader will abstract the information from the workers and only pass will abstract the information from the workers and only pass on the abstract information to the manager. This will prevent on the abstract information to the manager. This will prevent the manager from micromanaging. This variant can be viewed the manager from micromanaging. This variant can be viewed as an application of the as an application of the Law of DemeterLaw of Demeter (LoD) which states (LoD) which states that an object should talk only to closely related objects. The that an object should talk only to closely related objects. The closely related object for a worker is the group leader and not closely related object for a worker is the group leader and not the manager.the manager.
ShynessShyness – helps information restriction – helps information restriction (continued).(continued).– The motivation is that when things change at the The motivation is that when things change at the
worker level, the manager worker level, the manager does not have to be does not have to be informed necessarilyinformed necessarily. The group leader will be . The group leader will be informed and will decide whether the information informed and will decide whether the information needs to be passed up.needs to be passed up.
Complex requestsComplex requests – help information restriction and – help information restriction and optimization.optimization.– The manager does not want to be bothered by many simple The manager does not want to be bothered by many simple
requests from the many workers. Instead the manager prefers requests from the many workers. Instead the manager prefers to get a complex request from time to time from a group to get a complex request from time to time from a group manager. The complex request offers the manager the manager. The complex request offers the manager the possibility to possibility to see all the requests as a wholesee all the requests as a whole and to optimize and to optimize the overall result which would not be possible if simple the overall result which would not be possible if simple requests come one by one and need to be satisfied requests come one by one and need to be satisfied immediately before the totality of all simple requests is seen. immediately before the totality of all simple requests is seen.
Complex requestsComplex requests – help information restriction – help information restriction and optimization (continued).and optimization (continued).– The same point applies to programming: instead of The same point applies to programming: instead of
sending an object a lot of individual data access sending an object a lot of individual data access requests, it is better to send one complex request that requests, it is better to send one complex request that can be treated as a whole and optimized accordingly.can be treated as a whole and optimized accordingly.
AOP is programming with aspects. An aspect is AOP is programming with aspects. An aspect is a complex request to modify the execution of a a complex request to modify the execution of a program. May expose a large interface. This can program. May expose a large interface. This can be implemented efficiently by inserting code at be implemented efficiently by inserting code at compile time into the program. An aspect should compile time into the program. An aspect should be shy with respect to the program it modifies. be shy with respect to the program it modifies.
Lessons From Manager Lessons From Manager Metaphor.Metaphor.
Information hiding does not hide enough.Information hiding does not hide enough. Information hiding makes all public interfaces Information hiding makes all public interfaces available and (Micromanager) makes the point available and (Micromanager) makes the point that only an abstraction of those interfaces that only an abstraction of those interfaces should be visible at higher levels. should be visible at higher levels.
Lessons From Manager Lessons From Manager Metaphor (Continued).Metaphor (Continued).
In Shy Programming, only high-level information about In Shy Programming, only high-level information about the class or call graph is visible at the (shy) the class or call graph is visible at the (shy) programming level and this shields the program from programming level and this shields the program from many changes to the class or call graph in the same way many changes to the class or call graph in the same way as the manager is shielded from many of the changes in as the manager is shielded from many of the changes in the workers’ projects. The role of the group leader is the workers’ projects. The role of the group leader is played by the glue code that maps high-level played by the glue code that maps high-level information to low-level information and vice-versa. information to low-level information and vice-versa. Shy Programming is graph-shy.Shy Programming is graph-shy.
Application to Bioinformatics Application to Bioinformatics KnowledgeKnowledge
Need shy programming and shy knowledge Need shy programming and shy knowledge representation techniques for representation techniques for Bioinformatics.Bioinformatics.
Need domain-specific languages to define Need domain-specific languages to define function in a structure-shy way.function in a structure-shy way.
Writing Aspect-oriented Writing Aspect-oriented Programs With Programs With Strategies.Strategies.
class BusRoute { int countWaitingPersons() { Integer result = (Integer) Main.cg.traverse(this, WPStrategy, new Visitor(){ int r ; public void before(Person host){ r++; } public void start() { r = 0;} public Object getReturnValue() {return new Integer(r);} }); return result.intValue();}}
String WPStrategy=“from BusRoute through BusStop to Person”
A complex request
Complex requestplays role ofmanagerComplex request is class-graph shy
Aspect-oriented software development helps Aspect-oriented software development helps to create software that is to create software that is – More flexible; supports easy adaptation to More flexible; supports easy adaptation to
rapidly changing interfaces.rapidly changing interfaces.– Easier to understand and also shorter.Easier to understand and also shorter.– Supports the Shy Programming Principle.Supports the Shy Programming Principle.
Aspect-Oriented Software DevelopmentAspect-Oriented Software Development Software ComponentsSoftware Components ParallelismParallelism Domain Specific LanguagesDomain Specific Languages VisualizationVisualization Knowledge-Based Support SystemsKnowledge-Based Support Systems
THEMATICSTHEMATICS (M. Ondrechen; protein function from (M. Ondrechen; protein function from structure; high external visibility)structure; high external visibility)– Proc. Nat. Academy of Science publicationProc. Nat. Academy of Science publication– Featured in popular scientific magazines: Nature, Featured in popular scientific magazines: Nature,
American Chemical Society, Science DailyAmerican Chemical Society, Science Daily Subsurface Sensing and ImagingSubsurface Sensing and Imaging (many (many
Institute participants from this area)Institute participants from this area) Parallel Geant4Parallel Geant4 (CERN; Cooperman, Reucroft (CERN; Cooperman, Reucroft
and Swain; particle matter interaction -- million line and Swain; particle matter interaction -- million line program)program)
Roger Giese.Roger Giese.– The long term goal is to learn whether the The long term goal is to learn whether the measurement of DNA adducts in people can measurement of DNA adducts in people can help to individualize cancer prevention, help to individualize cancer prevention, analogous to the measurement of cholesterol analogous to the measurement of cholesterol as a biomarker for risk of a heart attack.as a biomarker for risk of a heart attack.
Some Other Faculty Highlights.Some Other Faculty Highlights.
Bob Futrelle.Bob Futrelle.– I'm particularly interested in the I'm particularly interested in the relations between bio-ontologies relations between bio-ontologies and text and diagrams.and text and diagrams.
Northeastern University and the Institute for Northeastern University and the Institute for Complex Scientific Software create Complex Scientific Software create knowledge of significant interest to knowledge of significant interest to bioinformatics.bioinformatics.
Aspect-Oriented Software Development is a Aspect-Oriented Software Development is a useful technology for the rapidly evolving useful technology for the rapidly evolving area of bioinformatics.area of bioinformatics.
We have developed an efficient graph We have developed an efficient graph search algorithm that solves the following search algorithm that solves the following problem:problem:
Input:Input:– Graph G1 = (V1, E1) with source s and target t.Graph G1 = (V1, E1) with source s and target t.– Graph G2 = (V2, E2) where V1 is a subset of V2.Graph G2 = (V2, E2) where V1 is a subset of V2.
Question: Does G2 contain a path that is Question: Does G2 contain a path that is an expansion of a path in G1 from s to t an expansion of a path in G1 from s to t (the algorithm works even if s and t are (the algorithm works even if s and t are sets of nodes.)sets of nodes.)
Given a path p, a path p' is called an Given a path p, a path p' is called an expansion, if p' can be obtained by expansion, if p' can be obtained by inserting one or more elements between inserting one or more elements between elements of p.elements of p.
More generally, we can find a third More generally, we can find a third graph that succinctly represents all graph that succinctly represents all possible such paths in G2.possible such paths in G2.
Do you see applications of such an Do you see applications of such an algorithm in biology?algorithm in biology?
G1 is a “small” graph that lists “important” G1 is a “small” graph that lists “important” nodes.nodes.
G2 is a “large” graph in which we want to G2 is a “large” graph in which we want to recognize paths that are expansions of recognize paths that are expansions of paths in the the “small” graph.paths in the the “small” graph.
Expansions of paths may contain additional Expansions of paths may contain additional nodes that are “noise” nodes.nodes that are “noise” nodes.
Lessons From Manager Lessons From Manager Metaphor (Continued).Metaphor (Continued).
AOP is related to (Micromanager) through the AOP is related to (Micromanager) through the observation that aspects should be loosely coupled to observation that aspects should be loosely coupled to the base programs they modify. The aspect should not the base programs they modify. The aspect should not be brittle with respect to the detailed calling structure of be brittle with respect to the detailed calling structure of the base program in the same way as the manager the base program in the same way as the manager should not rely on the details of the workers’ project. should not rely on the details of the workers’ project. There is an intermediary, called glue code, that maps There is an intermediary, called glue code, that maps the aspect to the detailed usage context. AOP is call-the aspect to the detailed usage context. AOP is call-graph shy.graph shy.