Top Banner
Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 (WG D23/0006/01) Elda Rossi, Andrew Emerson – CINECA –Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna –Renzo Cimiraglia, Celestino Angeli, Stefano Borini - Università di Ferrara –Daniel Maynau, Stefano Evangelisti - IRSAMC – Toulouse –José Sanchez-Marin - Universitat de Valencia –Peter Szalay - Eötvös Loránd University –Rosa Caballol - Universitat Rovira i Virgili Tarragona Computational Chemistry Motivation Vocabolary wrappers
23

Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

Mar 28, 2015

Download

Documents

Destiny Wagner
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

Looking for a (standard) Common Format for (Quantum)

A WG activity within COST action 23 (WG D23/0006/01)

– Elda Rossi, Andrew Emerson – CINECA–Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna–Renzo Cimiraglia, Celestino Angeli, Stefano Borini - Università di Ferrara–Daniel Maynau, Stefano Evangelisti - IRSAMC – Toulouse–José Sanchez-Marin - Universitat de Valencia –Peter Szalay - Eötvös Loránd University –Rosa Caballol - Universitat Rovira i Virgili Tarragona

– Elda Rossi, Andrew Emerson – CINECA–Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna–Renzo Cimiraglia, Celestino Angeli, Stefano Borini - Università di Ferrara–Daniel Maynau, Stefano Evangelisti - IRSAMC – Toulouse–José Sanchez-Marin - Universitat de Valencia –Peter Szalay - Eötvös Loránd University –Rosa Caballol - Universitat Rovira i Virgili Tarragona

Computational Chemistry

MotivationVocabolary

wrappers

Page 2: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

Motivation for the work

To build a meta-system for supporting research collaboration in the field of

“Localised Orbitals in post-SCF methods …

Linear Scaling methods in a Multi-Reference context”

MotivationVocabolary

wrappers

Page 3: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

The scenario

Different laboratories need to collaborate Different “home-made” codes need to be used together since

they give different views of the same problem General purpose “basic” codes needed to pre-compute data in

a sort of pipeline Programmes should remain

on their original sites under the responsibility of their authors

Different platforms Network connections (grid architecture)

Workflow

MotivationVocabolary

wrappers

Page 4: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

The need of a Common Format

The first problem we faced:How different codes (on different platforms) can communicate

we need a Common Format for (at least) Quantum Chemistry codes

MotivationVocabolary

wrappers

Page 5: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

Preliminary steps

Looking around …o CML available since long timeo XML is use by Accelrys for internal fileso XML is used by ArgusLab for internal files

All of them not completed suited for computational chemistry mainly structural chemistry, no Quantum Chemistry properties

XML seems the best technology so we took the decision to try another XML based format

HDF5 looked nice for storing large binary data typical of QC

MotivationVocabolary

wrappers

Page 6: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

IN-wrapper

OUT-wrapper

Program

IN-files

OUT-files

Data Data RepositoryRepositoryXML/HDF

Leaves the program unchanged

One wrapper for each program – If a code is added only one wrapper to be written

How should work the engine

MotivationVocabolary

wrappers

Page 7: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

QCML: an XML format for QC

In order to be as general as possible we need to write down a hierarchical schema of Quantum Chemistry quantities

As a first approximation three domains can be identified

Base FACTS initial data for describing the physics of the system

DERIVED quantities computed from FACTS using QC Fact algorithms (Energies, Props, integrals, coeff, …)

W-FLOW which codes are in the pipeline, specific input Parameters data, …

•A base fact is a fact that is a given in the world and is remembered (stored) in the system. •A derived fact is created by an inference or a mathematical calculation from terms, facts, other derivations, or even action assertions.

MotivationVocabolary

wrappers

Page 8: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

FACT: molecule<system title date program author><molecule nElectrons charge spinMultiplicity

spaceSymmetry> <symmetry> groupName/> <geometry type unit numAtoms symmetryRef > <atom symbol isotope x3 y3 z3/> <basis name type numOrbitals >

<atomBase angularMomMAX symbol > <angularMom value symbol numOrbitals> <orbital id numPrimitives> <exps/> <coeffs/>

–FACTS–DERIVED–W-FLOW

Symmetry: group name & other symmetry data

Geometry: only cartesian, full or unique for sym

Basis: by name or fully defined

MotivationVocabolary

wrappers

Page 9: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

DERIVED data: computedData

<system …>

<computedData>

<energy unit levelOfTheory quality value>

<state spaceSymmetry spinMultiplicity excitationLevel />

<property unit levelOfTheory quality value>

<state “bra” spaceSymmetry spinMultiplicity excitationLevel />

<state “ket” spaceSymmetry spinMultiplicity excitationLevel />

<operator order name/>

<file address URL/>

–FACTS–DERIVED–W-FLOW

A “schema” has been written for QCML

MotivationVocabolary

wrappers

Page 10: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

DERIVED : computedData/file

Two possible strategies:1. Leave data in their native format and translate

them only when needed. Maintain different version (formats) of the same data

2. Define a “standard” format for binary data and convert them anyway

Problem with large binary datasets include the reference not the actual data

The second was the solution of choice HDF5 appears to be a good solution

MotivationVocabolary

wrappers

Page 11: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

HDF Mission

To develop, promote, deploy, and To develop, promote, deploy, and support open and free technologies that support open and free technologies that facilitate scientific data storage, facilitate scientific data storage, exchange, access, analysis and exchange, access, analysis and discovery. discovery.

• Format and software for scientific data• Stores images, multidimensional arrays, tables, etc.• Emphasis on storage and I/O efficiency• Free and commercial software support• Emphasis on standards• Users from many engineering and scientific fields

MotivationVocabolary

wrappers

Page 12: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

“/MO”

“/” (root)

“/AO”

Example HDF5 file

Orb | occ | energy----|-----|----- 1 | 0 | 0.35 2 | 0.5| 0.26 3 | 2. | 0.69

TableTable

“/MO”

KineticKineticOverlapOverlap RepulsionRepulsion

Kinetic+Kinetic+RepulsionRepulsion

PropertyProperty

“/bi”“/mono”

4-D 4-D arrayarray

“/bi”“/mono”

“/coefficients”

MotivationVocabolary

wrappers

Page 13: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

HDF file structure for QCRoot AO <i/j>

<i/T/j> <i/Vnuc/j>

<i/T/j>+<i/Vnuc/j>

<ij/kl>

MO <i/T/j> <i/V/j> <i/T/j>+<i/Vnuc/j>

<ij/kl>

coeff(i,j)

Property <i/p/j>

NameQCML_refNorb

Norb

Spin Polar.: Orb Classif: Core

ActiveVirtual

Orb Energies: Orb Symm: [1-order]

+ format metadata (integer, binary, Endian-ism, …)

MotivationVocabolary

wrappers

Page 14: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

QCML processing: wrappers

One couple of wrappers for each code in the metasystem

They should be written & maintained by the authors of the chemical codes

XML processing can be used (DOM) but … what language???

o Fortran: no easy and stable DOM available

o Scripting languages (Perl/Python/Java): not known by chemists

We tried both ways (Fortran & Python)We tried both ways (Fortran & Python)

MotivationVocabolary

wrappers

Page 15: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

Fortran DOM: drawbacks

The only problem is the Fortran bindingo It doesn’t exist (at least last year …)o DOM is OO and Fortran is not

It exists a C binding (Gdome2) Gdome2 was installed – very hard work – on

a mainframe platform (it was conceived for Linux)

We are currently converting it to Fortran, by adopting the DOM recommendations (simplified …)

MotivationVocabolary

wrappers

Page 16: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

Why Fortran

GOODGOOD•Users don't need to learn a new language•Homogeneous environment

BABADD•Tricky: need an external library (f77xml) built on top of gdome2•Porting problems for gdome2/libxml2 may arise

MotivationVocabolary

wrappers

Page 17: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

F77xml library

Still in development ov0.4 is out (experimental, with limited features)ov1.0 upcoming, API changed to be nearly DOM2 compliant

Written in C on top of gdome2 http://gdome2.cs.unibo.it/index.html

Designed for interfacing to F77 (also F90 soon)Reduced namespace pollution

Cons: ● F77 syntax is difficult (DOM2 + tricks)● F90 syntax is simpler ● A pre-processor will convert F90 syntax to

F77http://freshmeat.net/projects/f77xml

MotivationVocabolary

wrappers

Page 18: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

F77xml library - V1.0 example

GdomeNode* gdome_el_firstChild (GdomeElement *self, GdomeException *exc);

Call f77xml_el_firstChild(nodeCode, elemCode, exc)

First position:Return value

NodeCode, elemCode,excmapped to INTEGER

Gdome2 (C)Gdome2 (C)

F90F90

F77F77

Func='el_firstChild'Call xp3t1(nodeCode,func,elemCode,exc)

Multiplexer function:x:p3: 3 parameters (+ name function) t1: type 1 parameter schema (code/code/error)

MotivationVocabolary

wrappers

Page 19: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

Why PythonGOODGOOD Very Easy Object Oriented Language Works well with strings Simple ed efficient DOM interface for XML Present in almost all UNIX/LINUX distribution

BADBAD Users do need to learn a new language Maybe less powerful than Perl Usually not used by chemists

MotivationVocabolary

wrappers

Page 20: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

Python Wrapper

At the present a prototype does work with molpro-fci chain.

It takes information from xml-repository Writes down proper MOLPRO and FCI input Starts the two programs

With a different XML file users should only specify the file name and some simple parameters (orbital guess for FCI)

MotivationVocabolary

wrappers

Page 21: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

Python or not

Python is very simple to learn and works very efficiently with xml

Scripts written in Python (at least for prototypes) are quite clear, linear and easy to maintain or upgrade

Possibility of a GUI could make our project much more user-friendly

MotivationVocabolary

wrappers

Page 22: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

What we have done …Single platform:

IBM SP4Two code chains MolPro to FCI MolPro to CasDI

MolProMolPro

FCIDUMP

QCML Repository

HDF5 Repository

OUT-wrapper

IN-wrapper

Bin file for FCI

FCIFCI

IN-wrapper

IN-wrapper

MolProIN-file

FCIIN-file

Start here

Stop here

Page 23: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian.

In conclusion …

Two important hints on data…Two important hints on data…1.Use some XML dialect for describing simple

structured data2.Use HDF5 for storing large array and binary data

Need of a good and easy API to XML & HDFNeed of a good and easy API to XML & HDF

How to manage the workflowHow to manage the grid connection