Building a Massive Virtual Building a Massive Virtual Screening using Grid Screening using Grid Infrastructure Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance Computing and Networking Center, Kasetsart University
28
Embed
Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Building a Massive Virtual Building a Massive Virtual Screening using Grid InfrastructureScreening using Grid Infrastructure
Chak SangmaCentre for CheminformaticsKasetsart University
Putchong UthayopasHigh Performance Computing and Networking Center, Kasetsart University
Motivation• Thailand’s Medicinal Plants is
important for Thai society– Over 1,000 species– Over 200,000 compounds– Multiple disease targets
• Problem– No complete collection of compounds
database– The practice is still mostly rely on local
knowledge and conventional wisdom– Lack of systematic verifications by scientific
methods
SIATIC PENNYWORT
Bariena lunulina Linae
Kasetsart University Thai Medicinal Plants Effort
• Led by Center for Cheminformatics, Kasetsart University (Dr. Chak Sangma)
• Current portal is built using Plone – http://www.plone.org/– Python based web content
management– Flexible and extensible
How things work!
ComputeResource
ComputeResource
ComputeResource
ComputeResource
ComputeResource
KU Campus network
Resource Broker
(SQMS/G)Portal
Grid MiddlewareGlobus2.4
Task Task
TaskTaskTaskMonitor
Results• The first version of
compound databases (around 3,000 compounds)
• 3,000 compounds screened ( found 30 high potential compounds)– 4 drug targets (Influenza,
HIV-RT, HIV-PR, HIV-IN)
XK-263
Experiences• Some files such as enzyme structure and output are very
large. – Require a good bandwidth between sites– Some simple optimizing techniques can help
• Implements caching of enzyme structure file at target hosts. Substantially reduce the number of transfer needed
• Batch scheduling approach is good if the systems are very homogenous– Allow dynamic execution code staging to the target host without
installation/recompilation• Many script tools must be developed to
– Streamline the execution– Handling data and code staging– Cleanup the execution
Next Generation Massive Screening on Grid
• Move to Service Oriented Grid – Use Grid and Web services to encapsulate key applications– Build broker and service discovery infrastructure– Rely heavily on OGSA and GT3.X, 4.X
• Portlet based portal– JSR 168: Portlet Specification compliance– More modular , customizable, flexible– Plan to adopt GridShpere from gridlab (www.gridlab.org)
• Use database as backend instead of files– OGSA DAI might be used for data access
Progress• We are working on
– New portal using GridSphere technology (done, testing)– Service wrapper for lagacy code
• Gamess, autodock (done, testing)
– MMJFS interface ( progress) – OGSA DAI integration (progress) – Service Registration and Discovery (partial) – Broker System ( design)– New Monitoring (done)
• Schedule – Finish and testing Jan-Feb 2005– Deploy in March 2005
Scheduler
MMJFS
Gamess
GamessService
Gamess
File Server
Portal
Portlet
OG
SA D
AI
BrokerServer
RegistrationServer
BackendDB
MolecularDB
Grid Ftp
Design Choices• Mass Data Transportation across site
– Central ftp server is used to store data/database – Each compute node can pull required data from this