Top Banner
Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois
25

Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Dec 19, 2015

Download

Documents

Maud Sanders
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Parsing Quantum Chemistry Output files

Using Jflex and CUP

Sudhakar Pamidighantam

NCSA, University of Illinois

Page 2: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

JFlex

• Jflex from http://jflex.deThis is a lexical analyzer generator for java

Lexical analysis is the process of taking an input string of characters (such as the Our Quantum Chemistry Output from an Application such as Gaussian and producing a sequence of symbols called "lexical tokens", or just "tokens", which may be handled more easily by a parser.

Tokens are symbols derived from regular expressions which are used by a parser for further action.

Page 3: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Typical Tokens from

Gaussian Output Strings

String Token (Symbol)

“Number of steps in this run” Found Iter

“Step number” NSearch

"NUMERICALLY ESTIMATING GRADIENTS ITERATION“ NSearch

"CCSD(T)=" Energy

"SCF Done: E(RHF) =" Energy

“Maximum Force” MaxGrad

“RMS Force” RmsGrad

Page 4: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

A Lexer InputfileTaken from examples

http://jflex.de/manual.html#ExampleUserCode • Java Specifics• import java_cup.runtime.*;• • %%

• Options and Declarations

• /* • The name of the class JFlex will create will be Lexer.• Will write the code to the file GoptfreqLexer.java. • */• %class GoptfreqLexer• %public

• %unicode ---- defines the set of characters the scanner will work on. For scanning text files, %unicode should always be used.

•%cup switches to CUP compatibility mode to interface with a CUP generated parser

• %cupdebug Creates a main function in the generated class that expects the name of an input file on the command line and then runs the scanner on this input file. Prints line, column, matched text, and CUP symbol name for each returned token to standard out.

Continued

Page 5: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Flex Input Lexical state

• %state ITER• %state INTVALUE• %state FLOATVALUE• %state ITER2• %state ITER3• %state FLOAT1• %state FLOAT2• %state IGNOREALL• %state INPUT• %state INPUTA• %state INPUTB• %state INPUTC• %state INPUTD• %state INPUTE• %state INPUTF

State is a lexical state and is identified by a name and it controls how matches can happen/not happen

Page 6: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Jflex File Structures• The code included in %{...%} is copied verbatim into the generated lexer class source. Here you can

declare member variables and functions that are used inside scanner actions. • /* Macro Declarations These declarations are regular expressions that will be used latter• in the Lexical Rules Section. */

• LineTerminator = \r|\n|\r\n • InputCharacter = [^\r\n]• WhiteSpace = {LineTerminator} | [ \t\f] • Comment = {TraditionalComment} | {EndOfLineComment} | {DocumentationComment}• TraditionalComment = "/*" [^*] ~"*/"• EndOfLineComment = "//" {InputCharacter}* {LineTerminator}• DocumentationComment = "/**" {CommentContent} "*"+ "/"• CommentContent = ( [^*] | \*+ [^/*] )* /* adjust syntax font-coloring */• Identifier = [:jletter:] [:jletterdigit:]*• dec_int_lit = 0 | [1-9][0-9]* • dec_int_id = [A-Za-z_][A-Za-z_0-9]* • DIGIT = [0-9]• FLOAT = [+|-]?{DIGIT}+"."{DIGIT}+• INT = [+|-]?{DIGIT}+• BOOL = [T|F]• EQ = "="• STRING = [A-Z]+• GRAB = [^(" "|\r|\n|\r\n| \t\f)]+

• %%

Page 7: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Jflex File• /* ------------------------Lexical Rules Section---------------------- */• • /*• This section contains regular expressions and actions, i.e. Java• code, that will be executed when the scanner matches the associated• regular expression. */• • /* YYINITIAL is the state at which the lexer begins scanning. So• these regular expressions will only be matched if the scanner is in

• the start state YYINITIAL. */ • • <YYINITIAL> {

Page 8: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Jflex Symbol Generation

• /* Return the token STPT declared in the class sym that was found. */

• "-- Stationary point found" { return symbol(Goptfreqsym.STPT); }

• “Standard orientation:” { return symbol(Goptfreqsym.GEOM;}

• /* Print the token found that was declared in the class sym and then return it. */

• “Standard orientation" { System.out.print(" + "); return symbol(Goptfreqsym.GEOM); }

Page 9: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Scanner Methods and Fields available in Action

• void yybegin (int lexicalState) /* enters the lexical state lexicalState */• String yytext() /* returns the matched input text region */ • ….. (see http://www.jflex.de/manual.html for more methods )

<YYINITIAL>{ "Stationary point found" { yybegin(ITER); return new Symbol(FinalCoordSym.FOUNDITER); }

<ITER>{

"X Y Z" { yybegin(INPUTF); return new Symbol(FinalCoordSym.INPUT1); }

"THE_END_OF_FILE" { yybegin(IGNOREALL); return new Symbol(FinalCoordSym.SCFDONE); }

"Standard orientation:" { yybegin(IGNOREALL); return new Symbol(FinalCoordSym.SCFDONE); }

.|\n {}

}

Page 10: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Parsing for Geometry• <INPUTF> { "---------------------------------------------------------------------"• { yybegin (INPUT); return new Symbol(FinalCoordSym.DASH1); } }

• <INPUT> { {INT} { yybegin (INPUTA); return new Symbol(FinalCoordSym.INPUT2, new Integer(yytext())); }

• "---------------------------------------------------------------------"• { yybegin (ITER); return new Symbol(FinalCoordSym.DASH2); } }

• <INPUTA> { {INT} { yybegin (INPUTB); return new Symbol(FinalCoordSym.INPUT3, new Integer(yytext())); } }

• <INPUTB> { {INT} { yybegin (INPUTC); return new Symbol(FinalCoordSym.INPUT4, new Integer(yytext())); } }

• <INPUTC> { {FLOAT} {yybegin (INPUTD); return new Symbol(FinalCoordSym.INPUT5, new Float(yytext())); } }

• <INPUTD> { {FLOAT} { yybegin (INPUTE); return new Symbol(FinalCoordSym.INPUT6, new Float(yytext())); }}

• <INPUTE> { {FLOAT} { yybegin (INPUT); return new Symbol(FinalCoordSym.INPUT7, new Float(yytext())); } }

• <IGNOREALL>{ .|\n {} }• .|\n {}

• Standard orientation: • ---------------------------------------------------------------------• Center Atomic Atomic Coordinates (Angstroms)• Number Number Type X Y Z• ---------------------------------------------------------------------• 1 7 0 .000000 .111062 .000000• 2 1 0 -.931526 -.259200 .000000• 3 1 0 .465763 -.259119 .806530• 4 1 0 .465763 -.259119 -.806530• ---------------------------------------------------------------------

Page 11: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

CUP Parser Generator

• CUP is a perser generator for Java

• It generates Look Ahead Left to Right parser from simple specifications

• This is similar to YACC

• These tools are used to construct relationships from basic structures for compilers ( and natural languages)

Page 12: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Cup File Structure

• 4 Main parts

Part 1.

preliminary and miscellaneous declarations

Imported Code ( classes)

Initialization

Invoking Scanner

Getting Tokens ( lexical tokens)

Page 13: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Cup File Structure

• Part 2

Declares Terminals and Non Terminals

Associate Object classes with above

Terminals are of type Notype or Integer

Terminals are symbols with Association to Strings ( Non terminals)

Page 14: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Cup File structure

• Part 3

Specification of Precedence and Associativity of Terminals

• Part 4

Grammar

Page 15: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Cup UsageIf the specification is in a file parser.cup thenjava java_cup.Main < parser.cup

Would result in two java source filesSym.javasym class contains a series of constant declarations, one for each

terminal symbol. This is typically used by the scanner to refer to symbols (e.g. with code such as "return new Symbol(sym.SEMI);" ).

Parser.javaThe parser class implements the parser itself.

Page 16: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

CUP File StructureNote

• To calculate and print values of each expression, we must embed Java code within the parser to carry out actions at various points.

• In CUP, actions are contained in code strings

which are surrounded by delimiters of the form {: and :} In general, the system records all characters within the delimiters, but does not try to check that it contains valid Java code.

Page 17: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Example finalcoord.cup Part1. Preliminaries/ Initialization • import java_cup.runtime.*;• import javax.swing.*;• import java.util.*;• import java.io.*;

/* comment code• Standard orientation: • ---------------------------------------------------------------------• Center Atomic Atomic Coordinates (Angstroms)• Number Number Type X Y Z• ---------------------------------------------------------------------• 1 7 0 .000000 .111062 .000000• 2 1 0 -.931526 -.259200 .000000• 3 1 0 .465763 -.259119 .806530• 4 1 0 .465763 -.259119 -.806530• ---------------------------------------------------------------------

• OUTPUT FORMAT:____________________________________________________________• 1NSERCH= 0 • more text • SCF Done: E(RHF) = -7.85284496695 A.U. after 8 cycles • more text• Maximum Force 0.000000 0.000450 YES• RMS Force 0.000000 0.000300 YES• more text• TO MONITOR:____________________________________________________________• iteration, energy

• MANUALLY ADD TO CUP-GENERATED CLASS IN SCFaParser.java:________________• //add to CUP$SCFaParser$actions• public ParseSCF2 parseSCF;

• //add to the constructor of CUP$SCFaParser$actions• parseSCF = new ParseSCF2();*/

Page 18: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Example finalcoord.cupPart1 continued…

• action code {: • //__________________________________• public static boolean DEBUG = true;• private static JTable table; • private static final String tableLabel = "SCF Intermediate Results:";• // private static String cycle = "0";• • • public static JTable getTable() {• return table;• }

• public static String getTableLabel() {• return tableLabel;• }

• // }• :}

Page 19: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Example finalcoord.cup

Part 2 Terminal and Non terminal Declarations

• terminal INPUT1, FOUNDITER, SCFDONE, DASH1, DASH2;

• terminal Integer INPUT2, INPUT3, INPUT4, ITERATION;• terminal Float ENERGY, INPUT5, INPUT6, INPUT7;• non terminal startpt, scfintro, scfpat, scfcycle, cycle,

grad1, grad2;• non terminal inp2, inp3, inp5, inp6, inp7, cycle1, cycle2,

cycle3;

Page 20: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Example finalcord.cup

Part 3 Precedence and associativity

This is optional and is important for ambiguous grammers

This is not sued as our parsing is straight forward

Page 21: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Example finalcoord.cup

Part 4 Grammar

// Start with a non-terminal ( symbol/string) ::= action, terminal, nonterminal, precedence and a ; at the end

// Java code is inbetween {: … :}// productions separated by | • startpt ::= scfintro scfpat SCFDONE ;• scfintro ::= FOUNDITER {: if (DEBUG)

System.out.println("CUP:Input: found the start of Iteration"); :};

• scfpat ::= scfpat scfcycle {: if (DEBUG) System.out.println("CUP:Input: in scfpat"); :} | scfcycle ;

• scfcycle ::= INPUT1 DASH1 cycle1 DASH2;

• cycle1 ::= cycle1 cycle2 | cycle2 ;

• cycle2 ::= inp2 inp3 INPUT4 inp5 inp6 inp7 ;

Page 22: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Example finalcoord.cup Grammar Continued

• inp2 ::= INPUT2:in2• {: //___________________________________________________________________• if (DEBUG) System.out.println("CUP:Input: center number "+in2); :} ;

• inp3 ::= INPUT3:in3• {: //___________________________________________________________________• if (DEBUG) System.out.println("CUP:Input: atomic number "+in3); :} ;

• inp5 ::= INPUT5:in5• {: //___________________________________________________________________• if (DEBUG) System.out.println("CUP:Input: x coordinate "+in5); :} ;

• inp6 ::= INPUT6:in6• {: //___________________________________________________________________• if (DEBUG) System.out.println("CUP:Input: y coordinate "+in6); :} ;

• inp7 ::= INPUT7:in7• {: //___________________________________________________________________• if (DEBUG) System.out.println("CUP:Input: z coordinate "+in7); :} ;

Page 23: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

CUP Customization

java java_cup.Main options < finalcoord.cup

• Options

-package GridChem

-sym FinalCoordSym.java

-parser FinalCoordParser.java

More options at http://www.cs.princeton.edu/~appel/modern/java/CUP/manual.html#about

Page 24: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

Bottom_up Parser Architecture

buffer

Of states visited

Page 25: Parsing Quantum Chemistry Output files Using Jflex and CUP Sudhakar Pamidighantam NCSA, University of Illinois.

State Actio n Go To