Chapter 3

Chapter 3 Describing Syntax and Semantics

CS 350 Programming Language DesignIndiana University Purdue University Fort Wayne

Chapter 3 TopicsIntroductionThe General Problem of Describing SyntaxFormal Methods of Describing SyntaxAttribute GrammarsDescribing the Meanings of Programs: Dynamic Semantics

IntroductionWho must use language definitions?Language designersImplementorsProgrammers (the users of the language)SyntaxThe form or structure of the expressions, statements, and program unitsDefines what is grammatically correctSemanticsThe meaning of the expressions, statements, and program unitsDescribing syntax is easier than describing semantics

Some definitionsA sentence is a string of characters over some alphabetA language is a set of valid sentencesThe syntax rules of the language specify which strings of characters are valid sentencesA lexeme is the lowest level syntactic unit of a languageFor example: sum, +, 1234A token is a category of lexemesFor example: identifier, plus_op, int_literalEach token may be described by separate syntax rulesThus we may think of sentences as strings of lexemes rather than as strings of characters

Describing syntaxSyntax may be formally described using recognition or generationRecognition involves a recognition device RGiven an input string, R either accepts the string as valid or rejects itR is only used in trial-and-error modeA recognizer is not effective in enumerating all sentences in a languageLanguages are usually infiniteThe syntax analyzer part of a compiler (parser) is a recognizer

Describing syntaxGenerationA language generator generates the sentences of a languageA grammar is a language generatorOne can determine if a string is a sentence by comparing it with the structure given by a generator

Formal methods for describing syntaxNoam Chomsky and John Backus independently developed similar formalisms in the 1950sIn the mid-1950s, Chomsky identified four classes of grammars for studying linguisticsRegular grammarsRecognizer Deterministic Finite Automaton (DFA)Context-free grammarsRecognizer Push-down automatonContext-sensitive grammarsRecognizer Linear-bounded automatonPhrase structure grammarsRecognizer Turing machineThe first is useful for describing tokensMost programming languages can be described by the second

Formal methods for describing syntaxContext-Free Grammar (CFG)A language generatorNot powerful enough to describe syntax of natural languagesDefines a class of programming languages called context-free languagesBackus-Naur Form (BNF)Presented in 1959 by John Backus to describe Algol 58Notation was slightly improved by Peter NaurBNF is equivalent to Chomskys context-free grammars

Formal methods for describing syntaxA meta-language is a language used to describe another languageBNF is a meta-language for programming languagesIn BNF . . .A terminal symbol is used to represent a lexeme or a tokenA nonterminal symbol is used to represent a syntactic classA production rule defines one nonterminal symbol in terms of terminal symbols and other nonterminal symbols

Production rule exampleThe following production rule defines the syntactic class of a while statement while ( ) The syntactic class being defined is on the left-hand side of the arrow (LHS)The text on the right-hand side (RHS) gives the definition of the LHSThe RHS above consists of 3 terminals (tokens) and 2 nonterminals (syntactic classes)Terminals: while, (, and )Nonterminals: and

Formal methods for describing syntaxNonterminal symbols may have multiple distinct definitions, as in . . . if then if then else Alternative form if then if then else More compactly, . . . if then | if then else The vertical bar | is read or

Formal methods for describing syntaxThe nonterminal symbol being defined on the LHS may appear on the RHSSuch a production rule is recursiveExampleLists can be described using recursion

identifier identifier ,

Formal methods for describing syntaxA grammar G = ( T, N, P, S ), whereT is a finite set of terminal symbolsN is a finite set of nonterminal symbolsP is a finite nonempty set of production rulesS is a start symbol representing a complete sentenceThe start symbol is typically named Generation of a sentence is called a derivationBeginning with the start symbol, a derivation applies production rules repeatedly until a complete sentence is generated (all terminal symbols)

Formal methods for describing syntaxAn example grammar

Any nonterminal appearing on the RHS of a production rule needs to be defined with a production rule (and thus appear on the LHS)In the grammar, integer is a token representing any integer lexeme | ; = a | b | c | d + | - | integer

Formal methods for describing syntaxAn example derivation

The symbol => is read derives

=> => => = => a = => a = + => a = + => a = b + => a = b + integer

DerivationEvery string of symbols in the derivation is called a sentential formIncluding A sentence is a sentential form that has only terminal symbolsA leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded nextA derivation may be leftmost or rightmost or neitherDerivation order has no effect on the language generated by a grammar

Parse TreeA parse tree is a hierarchical representation of a derivationEach internal node is labeled with a nonterminal symbol and each leaf is labeled with a terminal symbolA grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees

integera

=

b

+

Ambiguous grammar example

intintintintintint++**

| int + | *

AmbiguityThe compiler decides what code to generate based on the structure of the parse treeThe parse tree indicates precedence the operatorsDoes it mean ( int + int ) * int or int + ( int * int )Ambiguity cannot be toleratedIn the case of operator precedence, ambiguity can be avoided by having separate nonterminals for operators of different precedence

A non-ambiguous grammar + | * int | int

intint int*+ Derivation => + => + => int + => int + * int => int + int * int

Associativity of operatorsOperator associativity can also be indicated by a grammar + | int (ambiguous) + int | int (unambiguous)Example: a parse tree using the unambiguous grammarThe unambiguous grammar is left recursive and produces a parse tree in which the order of addition is left associativeAddition is performed in a left-to-right manner

intintint++

Dangling-else problem

Consider the grammar | | if then if then else This grammar is ambiguous sinceif then if then else has distinct parse trees represented by

and

if then if then else if then if then else

Dangling-else problemMost languages match each else with the nearest preceding elseless ifThe ambiguity can be eliminated by developing a grammar that distinguishes elseless ifs from ifs with else clausesSee text, page 131

Extended BNF (denoted EBNF)Three abbreviations are added for convenienceOptional parts on the RHS of a production rule can be placed in brackets[ ]Braces on the RHS indicate that the enclosed part may be repeated 0 or more times{ }When a single element must be chosen from a group, the options are placed in parentheses and separated by vertical bars( a | b | c )

Extended BNF examplesBrackets ident [ ( ) ]Generates: myProcedure and myProcedure( a, b, c )Braces ident { , ident }Generates: Larry, Curly, MoeChoice among options int ( + | - ) intGenerates: 5 + 7 and 5 - 7

BNF and EBNF exampleBNF: + - * / EBNF: { ( + | - ) } { ( * | / ) }

Extended BNFEBNF uses metasymbols |, {, }, (, ), [, and ]When metasymbols are also terminal symbols in the language being defined, instances that are terminal symbols must be quoted ident [ ( ) ]When regular BNF indicates that an operator is left associative, the corresponding EBNF does notBNF: + intEBNF: int { + int }This must be overcome during syntax analysis

Extended BNFSometimes a superscript + is used as an additional metasymbol to indicate one or more repetitionsExample: The production rules begin { } endand begin { }+ endare equivalent

BNF homework assignmentAnnounced in class

Attribute grammarsContext-free grammars (CFGs) cannot describe all of the syntax of programming languagesTypical example: a variable must be declared before it can be referencedSomething like this is called a context-sensitive constraintText refers to it as static semantics

Attribute grammarsStatic semantics refers to the legal form of a programThis is actually syntax rather than semanticsThe term semantics is used because the syntax check is done during syntax analysis rather than during parsingThe term static is used because the analysis required to check the constraint can be done at compile time

Attribute grammars (AGs) An attribute grammar is an extension to a CFGConcept developed by Donald Knuth in 1968The additional AG features describe static semanticsThese features carry some semantic info along through parse treesAdditional featuresAttributesCan be assigned values like variablesAttribute computation functionsSpecify how attribute values are calculatedPredicate functionsDo the checking

Attribute grammars definedDefinition: An attribute grammar is a context-free grammar G = (T, N, P, S) with the following additions:For each grammar symbol X there is a set A(X) of attributesSome of these are synthesizedThese pass information up the parse treeThe remaining attributes are inheritedThese pass information down the parse treeEach production rule has a set of attribute computation functions that define certain attributes for the nonterminals in the ruleEach production rule has a (possibly empty) set of predicate functions to check for attribute consistency

Attribute grammars definedLet X0 X1 ... Xn be a ruleSynthesized attributes are computed with functions of the formS(X0) = f(A(X1), ... , A(Xn))S(X0) depends only X0s child nodesInherited attributes for symbols Xj on the RHS are computed with function of the formI(Xj) = f(A(X0), ... , A(Xn))I(Xj) depends on Xj s parent as well as its siblings

Attribute grammars definedInitially, there are synthesized intrinsic attributes on the leavesWhen all attributes of a parse tree have been computed, the parse tree is fully attributedPredicate functions for X0 X1 ... Xn are Boolean functions defined over the attribute set{A(X0), ... , A(Xn)}For a program to be correct, every predicate function for every production rule must be trueAny false predicate function value indicates a violation of the static semantics of the language

Attribute grammarsExample: expressions of the form id + idid's can be either int_type or real_typetypes of the two id's must be the sametype of the expression must match it's expected typeBNF: = + idAttributes:actual_typeSynthesized for and Intrinsic for idexpected_typeInherited for from in =

The attribute grammarSyntax rule: [1] + [2]Attribute computation functon: .actual_type [1].actual_type Predicates: [1].actual_type == [2].actual_type.expected_type == .actual_typeSyntax rule: id Attribute computation functon: .actual_type lookup (id.type)

Attribute grammarsIn what order are attribute values computed?If all attributes were inherited, the tree could be decorated in top-down orderIf all attributes were synthesized, the tree could be decorated in bottom-up orderIn many cases, both kinds of attributes are used, and it is some combination of top-down and bottom-up that must be usedComplex problem in generalMay require construction of a dependency graph showing all attribute dependencies

Computation of attributesFor the generated expression: sum + increment.expected_type inherited from parent

[1].actual_type lookup (sum.type)[2].actual_type lookup (increment.type)

[1].actual_type =? [2].actual_type

.actual_type [1].actual_type

.actual_type =? .expected_type

SemanticsThe meaning of expressions, statements, and program units is known as dynamic semanticsWe consider three methods of describing dynamic semanticsOperational semanticsAxiomatic semanticsDenotational semantics

Operational semanticsOperational semantics describes the meaning of a language statement by executing the statement on a machine, either real or simulatedThe meaning of the statement is defined by the observed change in the state of the machinei.e., the change in memory, registers, etc.

Operational semanticsThe best approach is to use an idealized, low-level virtual computer, implemented as a software simulationThen, build a translator to translate source code to the machine code of the idealized computerThe state changes in the virtual machine brought about by executing the code that results from translating a given statement defines the meaning of the statementIn effect, this describes the meaning of a high-level language statement in terms of the statements of a simpler, low-level language

Operational semantics exampleThe C statement for ( expr1; expr2; expr3 ){ } is equivalent to:

The human reader can informally be considered to be the virtual computerEvaluation of operational semantics:Good if used informally (language manuals, etc.)Based on lower-level languages, not mathematics and logic expr1;loop: if expr2 = 0 goto out exp3; goto loopout:

Operational semantics homeworkAssigned in class

Axiomatic semanticsBased on formal logic (predicate calculus)Original purpose: formal program verificationEach statement in a program is both preceded by and followed by an assertion about program variablesAssertions are also known as predicatesAssertions will be written with braces { } to distinguish them from program statements

Axiomatic semanticsA precondition is an assertion immediately before a statement that describes the relationships and constraints among variables that are true at that point in executionA postcondition is an assertion immediately following a statement that describes the situation at that point Our point of view is to compute the preconditions for a given statement from the corresponding postconditionsIt is also possible to set things up in the opposite directionA weakest precondition is the least restrictive precondition that will guarantee the validity of the associated postcondition

Axiomatic semanticsNotation: {P} S {Q}P is the preconditonS is a statementQ is the postconditionExampleFind the weakest precondition P for: {P} a = b + 1 {a > 1} One possible precondition: {b > 10} Weakest precondition: {b > 0}

Axiomatic semanticsIf the weakest precondition can be computed for each statement in a program, then a correctness proof can be constructed for the programStart by using the desired result as the postcondition of the last statement and work backwardThe resulting precondition of the first statement defines the conditions under which the program will compute the desired resultIf this precondition is the same as the program specification, the program is correct

Axiomatic semanticsWeakest preconditions can be computed using an axiom or using an inference ruleAn axiom is a logical statement assumed to be trueAn inference rule is a method of inferring the truth of one assertion on the basis of the values of other assertionsEach statement type in the language must have an axiom or an inference ruleWe consider assignments, sequences, selection, and loops

Assignment statementsLet x=E be a generic assignment statementAn axiom giving the precondition is sufficient in this case:{Q x E} x = E {Q}Here the weakest precondition P is given by Q x EIn other words, P is the same as Q with all instances of x replaced by expression EFor example, consider a = a + b 3 {a > 10}Replace all instance of a in {a >10} by a+b-3This gives a+b-3>10, or b>13-aSo, Q x E is { b>13-a }

Inference rulesThe general form of an inference rule is

This states that if S1, S2, S3, , and Sn are true, then the truth of S can be inferred

S1, S2, S3, , SnS

The Rule of ConsequenceHere, => means impliesThis says that a postcondition can always be weakened and a precondition can always be strengthenedThus in the earlier examplethe postcondition { a>10 } can be weakened to { a>5 }the precondition { b>13-a } can be strengthened to { b>15-a }

Sequence statementsSince a precondition for a sequence depends on the statements in the sequence, the weakest precondition cannot be described by an axiomAn inference rule is needed for sequencesConsider the sequence S1;S2 of two statements with preconditions and postconditions as follows: {P1} S1 {P2}{P2} S2 {P3}The inference rule is:

Sequence statements exampleConsider the following sequence and postconditiony = 3*x + 1; x = y + 3 { x < 10 }The weakest precondition for x = y + 3 is { y < 7 }Since this is the postcondition for y = 3*x + 1, the weakest precondition for the sequence is { x < 2 }

Selection statementsConsider only if-then-else statementsThe inference rule is

Example: if ( x > 0 ) then y = y - 5 else y = y + 3 { y > 0 }The precondition for S2 is { x -3 }The precondition for S1 is { x > 0 } and {y > 5 }What is P ?Note that { y > 5 } => { y > -3 }By the rule of consequence, P is { y > 5 }{ B and P } S1 { Q }, { (not B) and P } S2 { Q}{ P } if B then S1 else S2 { Q }

LoopsWe consider a logical pretest (while) loop {P} while B do S end {Q}Computing the weakest precondition is more difficult than for a sequence because the number of iterations is not predeterminedAn assertion called a loop invariant must be foundA loop invariant corresponds to finding the inductive hypothesis when proving a mathematical theorem using induction

LoopsThe inference rule is

where I is the loop invariantThe loop invariant must satisfy each of the followingP => I (the loop invariant must be true initially){I and B} S {I} (I is not changed by the body of the loop)(I and (not B)) => Q (if I is true and B is false, Q is implied)The loop terminates (this can be difficult to prove){ I and B } S { I }{ I } while B do S end { I and (not B) }

ExampleConsider the loop: { P } while y x do y = y + 1 end { y = x }An appropriate loop invariant is: I = { y

LoopsThe loop invariant I is a weakened version of the loop postcondition, and it is also a precondition.I must be weak enough to be satisfied prior to the beginning of the loopWhen combined with the loop exit condition, I must be strong enough to force the truth of the postcondition

Axiomatic semanticsEvaluation of axiomatic semanticsDeveloping axioms or inference rules for all of the statements in a language can be difficultAxiomatic semantics is . . .a good tool for correctness proofsan excellent framework for reasoning about programsAxiomatic semantics is not as useful for language users and compiler writers

Axiomatic semantics homeworkAssigned in class

Denotational semanticsDenotational semanticsIs the most rigorous, widely known method for describing the meaning of programsBased on recursive function theoryFundamental conceptDefine a mathematical object for each language entityThe mathematical objects can be rigorously defined and manipulatedDefine functions that map instances of the language entities onto instances of the corresponding mathematical objects

Denotational semanticsAs is the case with operational semantics, the meaning of a language construct is defined in terms of the state changes In denotational semantics, state is defined in terms of the various mathematical objectsState is defined only in terms of the values of the program's variablesThe value of a variable is an instance of an appropriate mathematical object

Denotational semanticsThe state s of a program consists of the values of all its current variabless = {, , , }Here, ik is a variable and vk is the associated valueEach vk is a mathematical objectMost semantics mapping functions for program constructs map states to statesThe state change defines the meaning of the program constructExpression statements (among others) map states to values

Denotational semanticsLet VARMAP be a function that, when given a variable name and a state, returns the current value of the variable VARMAP(ik, s) = vkAny variable can have the special value undefi.e., currently undefined

Denotational semantics exampleThe syntax of decimal numbers is described by the EBNF grammar 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9)The denotational semantics of decimal numbers involves a semantic function that maps decimal numbers as strings of symbols into numeric values (mathematical objects)

Semantic function for decimal numbersMdec('0') = 0, Mdec ('1') = 1, , Mdec ('9') = 9Mdec ( '0') = 10 * Mdec ()Mdec ( '1) = 10 * Mdec () + 1 Mdec ( '9') = 10 * Mdec () + 9

Denotational semantics of expressionsAssume expressions consist of decimal integer literals, variables, or binary expressions having one arithmetic operator and two operands, each of which can only be a variable or integer literalThe value of an expression is an integerThe value of an expression is error if it involves an undef valueThus, expressions map onto Z {error} | | | | + | *

Semantic function for expressionsMe(, s) = case of => Mdec() => if VARMAP(, s) == undef then error else VARMAP(, s) => if (Me(., s) == undef or Me(., s) == undef) then error else if (. = + then Me(., s) + Me(., s) else Me(., s) * Me(., s) end case An expression is mapped to a value

Denotational semantics of assignmentsAssignment statements map states to states

Ma(x = E, s) = if Me(E, s) == error then error else s = {,,...,}, where, for j = 1, 2, ..., n, vj = VARMAP(ij, s) when ij x and vj = Me( E, s) when ij == x

Denotational semantics of logical pretest loopsLogical pretest loops map states to statesAssume Msl maps a statement list to a stateAssume Mb maps a Boolean expression to a Boolean value or to error Ml( while B do L end, s ) = if Mb(B, s) == undef then error else if Mb(B, s) == false then s else if Msl(L, s) == error then error else Ml(while B do L end, Msl(L, s) )

Denotational semantics of loopsThe meaning of the loop is the value of the program variables after the statements in the loop have been executed the prescribed number of times (assuming there have been no errors)In essence, the loop has been converted from iteration to recursion, where the recursive control is mathematically defined by other recursive state mapping functionsRecursion, when compared to iteration, is easier to describe with mathematical rigor

Denotational semanticsEvaluation of denotational semantics:Can be used to determine meaning of complete programs in a given languageProvides a rigorous way to think about programsCan be an aid to language design

Chapter 3

Documents

language definitions

alphabeta language

language generatorone

syntaxa metalanguage

class of programming

program unitsdescribing

string of characters

infinitethe syntax analyzer