Top Banner
30/11/14 1 una solució per ges/onar el Big Data en Química Computacional TSIUC’14 Universitat Autònoma de Barcelona, 2XII2014 Carles Bo ICIQ URV [email protected] Computa?onal Chemistry
14

io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

Aug 13, 2015

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

1  

 una  solucio  per  ges/onar  el  Big  Data  en  

Química  Computacional      

TSIUC’14  Universitat  Autònoma  de  Barcelona,  2-­‐XII-­‐2014  

       

Carles  Bo  ICIQ  -­‐  URV  

[email protected]        

Computa?onal  Chemistry  

Page 2: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

2  

Computa?onal  Chemistry  Taking  experiment  to  cyberspace  Nobel  Prize  Chemistry  2013  (1981,  1998)  

NOBEL PRIZE IN CHEMISTRY 2013POPULAR SCIENCE BACKGROUND

Taking the experiment to cyberspace

Chemical reactions occur at lightning speed; electrons jump between atoms hidden from the prying eyes of scientists. The Nobel Laureates in Chemistry 2013 have made it possible to map the mysteri-ous ways of chemistry by using computers. Detailed knowledge of chemical processes makes it pos-sible to optimize catalysts, drugs and solar cells.

Chemists all over the world devise and carry out experiments on their computers on a daily basis. With the help of the methods that Martin Karplus, Michael Levitt and Arieh Warshel began to develop in the 1970s, they examined every tiny little step in complex chemical processes that are invisible to the naked eye.

In order for you, the reader, to get an idea of how mankind can benefit from this, we begin with an example. Put your lab coat on, because we have a challenge for you: to create artificial photosyn-thesis. The chemical reaction occurring in green leaves fills the atmosphere with oxygen and is one prerequisite for life on Earth. But it is also interesting from an environmental perspective. If you can mimic the photosynthesis you will be able create more efficient solar cells. When water molecules are split oxygen is created, but also hydrogen that could be used to power our vehicles. So there is ample reason for you to get engaged in this project. If you succeed, you could contribute to solving the problem with greenhouse effect.

Nob

el P

rize®

is a

regi

ster

ed tr

adem

ark

of th

e N

obel

Fou

ndat

ion.

Figure 1. Today chemists experiment just as much on their computers as they do in their labs. Theoretical results from computers are confirmed by real experiments that yield new clues to how the world of atoms works. Theory and practice cross-fertilize each other.

Permanent  storage.  Cer/fy  results.  Re-­‐use  results.  

Page 3: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

3  

Our  Big  Data  Problem  (1)  

Help  researchers  in  their  daily  tasks  (manage  &  store  results,  apps  &  tools)  

Our  Big  Data  Problem  (2)  

Manage  files  of  former  group  members  

Page 4: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

4  

Our  Big  Data  Problem  (3)  

Suppor/ng  Informa/on  files  Cer/fy  results  -­‐  Reuse  results  

Yes,  Comp  Chem  is  a  Big  Data  Problem  

Page 5: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

5  

5  ★  Open  Data  Tim  Berners-­‐Lee  

 

OL:  Open  license    OF:  Open  format                            LD:  Linked  RE:  Readable  data    URI:  Accessible  

Scien?sts  

Submit  jobs    

Data  Collec?on  Manually    

Reports                    (pdf  files)  Manually  

HPC  

Files                                    TeraBytes      >95%  waste  

Publishers  

Files  

Public  

Informa?on  

Present  

Page 6: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

6  

Scien?sts  

Submit  jobs    Workflows  

Data  Collec?on  Automated  

Reports                    XML  

Automated  

Cloud  HPC  HPC                                    

on  demand  

Results  Databases  

XML  

Publishers  

Informa?on  

Public  

Files  

Informa?on  

Future  

Scien?sts  

Submit  jobs  

Data  Collec?on  Manually  

Reports                    XML  

Automated  

HPC  

HPC                                    

Results  Databases  

XML  

Publishers  

Files  

Public  

Files  

Informa?on  

ioChem-­‐BD  

Page 7: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

7  

5  ★  Open  Data  Tim  Berners-­‐Lee  

 

Present  

ioChem-­‐BD  

Defini?on     ioChem-­‐BD   is   a   Digital   Repository   aimed   to   manage   and   store  Computa/onal  Chemistry  files  (inputs  &  outputs),  and  comes  to  fill  the   gap   between   results   genera?on   and  manuscripts   publica?on,  and  raise  data  to  5*  quality.  

   Created  by  the  fusion  of  previous  projects:    

Page 8: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

8  

Goals  •  Build  a  distributed  database  of  computa?onal  chemistry  results:  

reduce  size  and  increase  value.  •  Set  a  common  data  standard  among  all  quantum  chemistry  legacy  

formats  (XML  -­‐  CML).  •  Become  a  daily  tool  in  data  management,  search  and  manipula?on  •  Redefine  workflows:  store  results  and  publishing,  open-­‐data  •  Be  open  to  add  future  func?onali?es  for  data  manipula?on  and  

analysis  

ioChem-­‐BD  features  

•  Dynamic  independent  templates  for  data  extrac?on  of  data  display  •  Data  representa?on  set  on  top  of  priori?es  (XML-­‐CML)  •  Responsive    design  (any  device  is  able  to  render  our  content)  •  Data  easily  exportable  to  other  formats  •  Secure  connec?ons  •  Fully  compliant  with  latest  web  standards  

Page 9: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

9  

Performance  of  our  new  extrac?on  library  

0  

50  

100  

150  

200  

250  

300  

350  

400  

450  

112.73   502.88   1,012.32   1,914.19   1,914.19   2,559.18   2,573.73   3,421.10   3,486.16   5,076.22  30,229.58  68,328.04  

Parsing  /m

e  (s)  

File  size  (kB)  

Conversion  /me  vs  File  size  Plain  text  to  CompChem  CML  

jumbo-­‐converters  

jumbo-­‐saxon  

jumbo-­‐saxon  with  keep  field  

≈14x  

≈4x  

Upload Convert Store User interfaces

WEB  

Shell  

User  files    (input/output)  

 

Conversion  templates  

Create &

Browse

Search

Manage

Publish Share

Convert

Page 10: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

10  

Workflow  steps  (1):  Create  

Results  files  are  uploaded  from  user’s  disk  space    -­‐  Create  shell  client  -­‐  Create  web  interface    -­‐  Cer/ficate  results  (True  Data)  -­‐  Valida/on  (Convergence  WF,  Geometries)  

 

Create:  Shell  client  

Page 11: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

11  

Basic  commands  Command   Descrip/on  

start-­‐rep-­‐shell   Connect  to  repository  (mandatory)  

exit-­‐rep     Disconnect  from  repository  

lspro     List  current  path  contents  

pwdpro     Print  current  path  

Project  related  commands  Command   Descrip/on  

catpro     Display  project  informa?on  

cdpro     Change  to  project  

cpro     Create  a  new  project  

mpro     Modify  a  project  

dpro     Delete  a  project  

findpro     Find  project  by  it’s  name  (regex  allowed)  

Calcula?on  related  commands  Command   Descrip/on  

loadcalc   Load  calcula?on  into  repository  

viewcalc   View  calcula?on  informa?on  

Create:  Shell  client  

Create:  Web  interface  

Page 12: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

12  

Workflow  steps  (2):  Create  

The  Create  module  manages  results  and  facilitates  advanced  data  treatment        

•  Manage  –  Post-­‐processing  –  Organize  projects  collec?ons  –  Enrich  Data:  Descrip?on,  keywords,  addi?onal  files  –  Reports:  Generate  Sup.  Info.  files  (pdf)  for  publishing  –  Reac?on  Energy  paths  

 –  Consistency  (level  of  theory)  –  Thermodynamic  correc?ons  –  Kine?c  Analysis  (  TOF,  %  e.e.)  –  Molecular  descriptors  (QSAR)  –  etc  …    

Create:  Web  interface  

Page 13: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

13  

Workflow  steps  (3):  Browse  

Results  can  then  be  published  and  made  available  for  viewing  and  downloading  by  general  public  on  Browse  module    Handle  URL  generator      Rich  XML  Suppor?ng  Informa?on  files    Linked  to  a  published  manuscript      

Browse:  Web  interface  

Page 14: io-Chem-BD, una solució per gestionar el Big Data en Química Computacional

30/11/14  

14  

Current  project  status  

•  Private  &  Demo  servers  up  (  www.iochem-­‐bd.org)  •  Supported  formats:  

–  Gaussian,  ADF,  VASP  –  Molcas  (50%)  

•  Tes?ng  integrity  (user-­‐driven  tests)  •  Checking  Data  captured  &  displayed  •  Reports  Module  (50%)  

•  To  do:  sindicate  distributed  browsers,  links  to  external  databases,  …  

 

Acknowledgements

Moises  Álvarez  

N.  Lopez,  F.  Maseras,  J.  M.  Poblet,  C.  De  Graaf