Wednesday, March 31, 2010

The DeadEnds Interpreter, Part Three

This (rather long) post describes the main Objective-C classes used in the implementation of the DeadEnds programming language interpreter. When program files are parsed the interpreter builds up a semantic representation of the modules (functions and procedures) defined in the program. This representation is a tree of objects; most of the objects are instances of different Expression and Statement classes implemented in Objective-C.

There are three main classes used to build the internal representations of programs -- Module, Expression and Statement. The sections that follow describe these three classes and many of their sub-classes.

Modules
Each function or procedure parsed from a DeadEnds program results in a Module object being created that is added to a table of Modules. As execution of the program proceeds, expressions in the module begin interpreted may call other modules. The Objective-C interface for the Module class is:

@interface Module : NSObject
{
InterpType type;
NSString* name;
NSArray* parameters;
BlockStatement* body;
}
- (id) initWithType: (NSNumber*) type name: (NSString*) name parameters: (NSArray*) parameters block: (BlockStatement*) block;
- (ReturnCode) interpretArguments: (NSArray*) arguments symbolTable: (SymbolTable*) symbolTable;
@end

A Module has a return type, a name, a list of parameters and a BlockStatement. During execution Modules are called by CallExpressions. When a CallExpression is evaluated, the name in the Expression is used to lookup the Module object in the Module table. The argument Expressions in the CallExpression are evaluated and assigned to the Module's parameters and used to initialize the symbol table that provides the context for interpreting the Module. The Module's BlockStatement is then interpreted.

The important method of the Module class is the interpArguments:symbolTable method that interprets the Module. The method requires two arguments, the list of argument Expressions to pass to the Module, and the SymbolTable at the base of the active SymbolTable chain that provides the execution context.

Expressions
Expressions are semantic objects created during parsing and they make up a substantial portion of the semantic trees. Generally each expression in the programs will be represented by an Expression object in the semantic representation. There is one abstract class that encompass all Expression classes, another abstract sub-class for numeric constants, and a number of concrete Expression sub-classes for the different types of expressions found in the programming language. The simplified Objective-C interface to the Expression classes is:

@interface Expression : NSObject
- (RValue*) evaluate: (SymbolTable*) symbolTable;
@end

The Expression class is the base of all Expression sub-classes. It handles line numbers (part of error handling, not shown), and defines the primary method that all Expression sub-classes must implement. This is the evaluate: method, that evaluates an Expression in the context of a chain of SymbolTables. Every evaluate method returns an RValue object, an object that wraps one of the 13 (very likely more later) types supported by the language. Evaluation of Expressions does not modify the Expressions; Expressions are semantic objects that are part of the internal semantic representation of Modules. Evaluation of an Expression creates a dynamic RValue object that represents the run-time value of the Expression in the context of the current programming state, which is encoded in the states of the accessible SymbolTables.

@interface Constant : Expression
{
NSNumber* number;
}
- (id) initWithNumber: (NSNumber*) number;
@end

The Constant class is the base of all numeric constants. A Constant is a container for an NSNumber, a foundation object that can hold any kind of basic numeric value.

@interface IntegerConstant : Constant
@end

@interface FloatConstant : Constant
@end

@interface BooleanConstant : Constant
@end

@interface CharacterConstant : Constant
@end

@interface StringConstant : Expression
{
NSString* string;
}
- (id) initWithString: (NSString*) string;
@end

A StringConstant holds a single string as an NSString, another foundation object. Evaluation of the numeric Constants and StringConstants consists of simply returning a copy of their values wrapped in RValue objects.

@interface Variable : Expression
{
NSString* name;
}
- (id) initWithName: (NSString*) name;
@end

A Variable object holds the name of a program variable. Evaluating a Variable object consists of looking the variable's name up in the current set of SymbolTables and returning the variable's current RValue. As we will see later SymbolTables map variable names to their LValues, and LValues refer to their corresponding RValues. There should be a little more written somewhere to discuss the differences between LValues and RValues and how the concepts are during evaluation and interpretation.

@interface BinaryExpression : Expression
{
Expression* leftExpression;
Expression* rightExpression;
Operator operator;
}
- (id) initLeft: (Expression*) left right: (Expression*) right operator: (NSNumber*) operator;
- (RValue*) evaluateDotExpression: (SymbolTable*) symbolTable;
- (RValue*) evaluateAssignExpression: (SymbolTable*) symbolTable;
- (RValue*) evaluateAndOrExpression: (SymbolTable*) symbolTable;
- (RValue*) evaluateRelationalExpression: (SymbolTable*) symbolTable;
- (RValue*) evaluateArithmeticExpression: (SymbolTable*) symbolTable;
@end

A BinaryExpression holds an expressions with left side and right side sub-Expressions and an operator. Evaluating a BinaryExpression involves recursively evaluating the two sub-expressions and then using the operator to compute the final RValue. Five methods are used for the five major types of BinaryExpressions. Dot-expressions implement method calls and structure field access; assign-expressions are assignment statements; and-or-expressions are logical expressions (their evaluation uses the usual short-circuiting rules which may avoid evaluation the right sub-expression); relational-expressions compare the sub-expressions; and arithmetic expressions perform arithmetic operations. Evaluation of BinaryExpressions may involve implicit type coercion rules.

@interface UnaryExpression : Expression
{
Expression* expression;
Operator operator;
}
- (id) initExpression: (Expression*) expression operator: (NSNumber*) operator;
@end

UnaryExpressions holds unary expressions, those with an operator (+, -, !) and a sub-Expression. Evaluating a UnaryExpression involves recursively evaluating the sub-Expression and then applying the operator to that value.

@interface CallExpression : Expression
{
NSString* name;
NSArray* arguments;
}
- (id) initWithName: (NSString*) name arguments: (NSArray*) arguments;
@end

CallExpressions are function and procedure calls (Method calls are handled by BinaryExpressions). Each CallExpression consists of the name of the Module to call and a list of arguments to pass in, where each is an Expression in its own right. Evaluation of a CallExpression involves looking up the name in a table to get a Module object, evaluating the arguments and binding them to the Module's parameters in a new symbol table, and then interpreting the Module's BlockStatement. If the BlockStatement terminates with a ReturnStatement with an Expression, the ReturnStatement evaluates that Expression into an RValue, and that RValue becomes the final value of the CallExpression. Not as complicated as it sounds.

Statements
Statements are semantic objects created during parsing and, along with Expressions, are the other main object found in semantic trees. Each language statement parsed from a program is represented as a Statement object in a semantic tree. Statements are recursive because many Statements (e.g., if-statements, while-statements, block statements) contain other statements. The simplified Objective-C interface for the Statement classes is:

@interface Statement : NSObject
- (ReturnCode) interpret: (SymbolTable*) symbolTable;
@end

The Statement class is the abstract base class for all Statement sub-classes. It handles line numbers (part of error handling, not shown) and defines the key method that all Statement sub-classes must implement. That is the interpret: method, that interprets (runs, executes) a Statement in the context of a set of symbol tables. Every interpret: method returns a return code used to direct follow-on actions. Normally the return code indicates the normal case and interpretation continues to the next Statement in the program. Describing the other return codes is a little advanced for this introduction.

@interface ExpressionStatement: Statement
{
Expression* expression;
}
- (id) initWithExpression: (Expression*) expression;
@end

An ExpressionStatement is syntactically an Expression followed by a semicolon. For example, an assignment statement is an ExpressionStatement where the Expression is a BinaryExpression with '=' as its operator. Interpretation of an ExpressionStatement consists of evaluating the Expression and accumulating whatever side effects that it causes (e.g., the change of a value in a symbol table).

@interface IfStatement: Statement
{
Expression* conditional;
Statement* thenStatement;
Statement* elseStatement;
}
- (id) initWithConditional: (Expression*) conditional thenStatement: (Statement*) thenStatement elseStatement: (Statement*) elseStatement;
@end

IfStatements implement conventional if-then-else Statements where the else Statement is optional. Interpretation of an IfStatement begins by evaluating the conditional Expression that is coerced to a Boolean value. If that value is true the then Statement is recursively interpreted; if that value is false and the else Statement exists, the else Statement is recursively interpreted.

@interface WhileStatement: Statement
{
Expression* conditional;
Statement* statement;
}
- (id) initWithConditional: (Expression*) conditional statement: (Statement*) statement;
@end

WhileStatements implement conventional while statement semantics. Interpretation of a WhileStatement begins by evaluating the conditional Expression and coercing its value to a Boolean. If the value is true interpretation continues with the interpretation of the body Statement; after the Statement is interpreted the Expression is reevaluated in the new context and the Statement will be interpreted again if the Expression's value is still true. This continues until the value of the Expression is false, at which point interpretation of the WhileStatement ends. Note that ReturnStatements, ContinueStatements and BreakStatements found within the body Statement implement the conventional rules for such statements.

@interface DoStatement: Statement
{
Statement* statement;
Expression* conditional;
}
- (id) initWithStatement: (Statement*) statement conditional: (Expression*) conditional;
@end

DoStatements implement conventional do-while statement semantics. Interpretation of DoStatements is similar to that of WhileStatements except that the order of evaluating the conditional Expression and interpreting the body Statement are different. As in WhileStatements, any ReturnStatement, ContinueStatement or BreakStatement found in the body Statement is handled in the conventional manner.

@interface BlockStatement : Statement
{
NSArray* statements;
}
- (id) initWithStatements: (NSArray*) statements;
@end

A BlockStatement is simply an array of Statements. BlockStatements are interpreted by interpreting each of its sub-Statements, from first to last, in sequence.

@interface Declaration : Statement
{
InterpType type;
NSArray* initializers;
}
- (id) initWithType: (NSNumber*) type initializers: (NSArray*) initializers;
@end

Declarations are Statements that declare new Variables and optionally assign them their initial values. In the DeadEnds programming language Declarations are treated exactly as other Statements and can appear anywhere any other Statement can occur. Interpretation of a Declaration causes the Variables to be added to the SymbolTable associated with the BlockStatement the Declarations in (or to the global SymbolTable if the Declaration is outside of any Module definition). If a Variable has an initializing Expression, the Expression is evaluated normally, that is, in the context of the current SymbolTable, and the new Variable is assigned given that initial RValue in the SymbolTable. Initializing Expressions can refer to Variables that were declared earlier in the same BlockStatement. This is exactly what one would hope would happen.

@interface ReturnStatement: Statement
{
Expression* returnExpression;
}
- (id) initWithExpression: (Expression*) expression;
@end

ReturnStatements represent return statements in DeadEnds programs. When a ReturnStatement is interpreted control returns from the current Module to the CallExpression in the Module that called it. If the ReturnStatement has an Expression (it is optional), the Expression is first evaluated in the context of the current set of SymbolTables, and the resulting RValue is returned to the calling Module as the value of the CallExpression.

@interface ContinueStatement: Statement
@end

What you'd expect, the semantic object representing a continue statement in a DeadEnds program. Interpreting a ContinueStatement causes interpretation control to move to the next iteration of the WhileStatement or DoStatement the ContinueStatement is embedded in.

@interface BreakStatement: Statement
@end

What you'd expect, the semantic object representing a break statement in a DeadEnds program. Interpreting a BreakStatement causes interpretation control to break out of the current control structure using the conventional rules for break statements.

Values
Values are values created at run-time when program Modules are being interpreted. Values are always the result of evaluating Expressions within the context of a chain of symbol tables. There are two types of values, RValues and LValues, that are described in more detail below. The simplified Objective-C interface to the two value classes is:

@interface RValue : NSObject
{
InterpType type;
id value;
BOOL hasLValue;
}
- (id) initWithType: (InterpType) type value: (id) value isLValue: (BOOL) isLValue;
- (id) initWithInterpValue: (RValue*) interpValue;
- (BOOL) boolValue;
- (RValue*) coerceType: (InterpType) toType;
@end

RValue objects are computed by the interpreter at run-time by evaluating Expressions. An RValue can hold any one of the many (currently 15) valur types supported by the DeadEnds programming language (Boolean, Character, Integer, Float, String, List, Set, Table, Node, Record, Person, Family, Void, Any and Error). The type field indicates the specific type, and the value field holds a foundation object (usually an NSNumber or NSString) that holds the actual value. The hasLValue field is true if the RValue is referenced by an LValue, which generally means it is the value of a Variable. The RValue class has two methods to initialize an RValue, a method to return the coercion of the value to a Boolean value, and the coerceType: method that attempts to coerce the RValue from one type to another. If the coercion is not possible the method return an RValue with the Error type.

@interface LValue : NSObject
{
RValue* rValue;
}
- (id) initWithRValue: (RValue*) anRValue;
@end

LValues are objects that hold references to RValues. LValues are used in SymbolTables. A SymbolTable is a map (implemented by an NSMutableDictionary, another foundation class) from the names of Variables to their LValues. In a normal compiler an l-value is the actual location in memory where an r-value is located. In the object-oriented DeadEnds interpreter, an LValue becomes a reference (effectively a location) to another object, the RValue that holds the Variable's actual value. When an assignment BinaryExpression is evaluated, the RValue of the Variable on the left side of the assignment operator is changed to hold the RValue computed by evaluating the Expression on the right side of the operator. Since the Variable's RValue is being pointed to by the its LValue, this change to the RValue effectively changes the value of the Variable itself through the LValue mechanism. Note that the evaluation logic does not need to know if an RValue is the target to an LValue or not. The only reason why the evaluation logic cares about the existence of LValues is for issuing warnings if an assignment statement would fail to change the program's state because the left hand side is not a Variable or a component of a structured Variable. Again, it's not nearly as complicated as it sounds.

Symbol Tables
SymbolTables are run-time objects that map Variable names to their LValues and the LValues refer to their RValues. At any given time there is a chain of active SymbolTables. At the top of the chain is the global SymbolTable that holds any global Variables that were defined outside of any Module. At the bottom of the chain is the SymbolTable holding the Variables declared in the deepest BlockStatement currently being interpreted. Between the top SymbolTable and the bottom SymbolTables are intermediate SymbolTables holding the Variables defined in any open and pending BlockStatements between the current location in the program and the BlockStatement at the Module level. Note that if a BlockStatement does not have any DeclarationStatements the interpreter does not create a SymbolTable for that BlockStatement. When looking up a Variable in a SymbolTable, the SymbolTable at the bottom of the chain is searched first. If the Variable cannot be found in the bottom SymbolTables, the SymbolTables are searched one by one, up the chain, until the global SymbolTable is reached and searched. If the Variable cannot be found after all SymbolTables have been checked the Variable is undefined and an Error object is returned.

@interface SymbolTable : NSObject
{
NSMutableDictionary* nameValueTable;
SymbolTable* parent;
}
- (id) initWithParent: (SymbolTable*) parent;
- (BOOL) contains: (NSString*) name;
- (LValue*) valueOfName: (NSString*) name;
- (void) addDeclaration: (Declaration*) declaraction;
- (void) setValue: (LValue*) lValue ofName: (NSString*) name;
- (RValue*) lookup: (NSString*) name;
@end

No comments:

Post a Comment