DeadEnds Genealogical Software

Thursday, April 15, 2010

DeadEnds Interpreter, Expression Values

The most common operation that occurs when the DeadEnds Interpreter is running a program is expression evaluation. Within the context of the interpreter an expression is a semantic entity, an object of a sub-class of TWInterpExpression, created when a DeadEnds program was parsed. Expressions are evaluated by their evaluate: methods that are passed a symbol table (an object of class TWInterpSymbolTable) that holds the values the variables at the time of the call.

Here are the forms that a value can take.

As a TWInterpExpression

A TWInterpExpression object is the semantic representation of a bit of programming language syntax, part of the internal form of a parsed module. These expressions are recursive. For example, the expression found in a DeadEnds program:

i*alpha + b*21

is represented as a tree of seven TWInterpExpressions, three for the variables, one for the integer constant, two for the multiplication operators, and one for the plus operator. It is better to think of this structure as the potential for computing a value, rather than as the value itself. The process of evaluation, implemented by the evaluate: methods, is to convert these trees of semantic objects in a single real value. The evaluate: method for an integer constant expression (an object of type TWInterpIntegerConstant, a subclass of TWInterpExpression), returns the integer constant as an R-Value (see below). The evaluate: method for a variable (an object of type TWInterpVariable, also a subclass of TWInterpExpression) looks up the variable name in the sequence of symbol tables that are currently active and returns the variable's value as an R-Value. The evaluate: method for a binary expression (an object of type TWInterpBinaryExpression, also a subclass of TWInterpExpression) recursively calls the evaluate: methods on its left and right sub-expressions to get two R-Values, combines those values using the mathematical operator, and returns the overall result as a new R-Value.

As final, evaluated, values. These values are Booleans, Unicode characters, integers, floating point numbers, Unicode strings, lists, sets, tables, nodes and records. These are the types that a DeadEnds programmer thinks in. The DeadEnds Interpreter allows the user to write programs and think in terms of values of these types.

Behind the scenes the DeadEnds Interpreter must represent these values in a consistent form. There are four of these forms:

Primitive values

Values at this level are the lowest form possible. For example an integer is a 32 or 64 bit word on the computer. A list is an NSArray from the Foundation framework.

Boxed Values

Values at the next level up are Objective-C objects that represent the primitive values. Boxing is the term used to mean creating an object type for a primitive type in order to allow values of the primitive type to have all the benefits of objecthood. In the case of the DeadEnds Interpreter, of the Boolean, Character, Integer, and Float type are all boxed as NSNumber objects from the Cocoa Foundation. The String type is represented at the primitive level as an NSString object, which is an object, so have the same value at the both the primitive and boxed levels. The List, Set, and Table values at this level are the same as at the primitive level, objects of the Foundation classes NSArray, NSSet and NSDictionary. The Primitive and Boxed values of Node and Record objects are TWGedcomNode and TWGedcomRecord objects, which are classes defined in the TWGedcom Objective-C library.

R-Values

An R-Value is nothing more than a Boxed Value and a type code to indicate what kind of value is boxed. The values returned by all evaluate: methods are R-Values.

L-Values

L-Values are references to R-Values, and that's all that they are. A DeadEnds symbol table (an object of class TWInterpSymbolTable) contains a table (NSMutableDictionary) that maps variable names (NSStrings) to L-Values (objects of class TWInterpLValue). That L-Value refers to the R-Value that holds the boxed value of the variable. Consider the following code from a DeadEnds Program

Integer i = 0;

i = i + 1;

After the first line, the declaration of i, is interpreted there will be a new mapping added to the current symbol table. The variable name i maps to a newly created L-Value that refers to a newly created R-Value that contains a boxed NSNumber object that contains the Integer 0. When the assignment expression (the assignment expression is a top level binary expression whose left expression is the variable i and whose right expression is the binary expression i + 1. To evaluate an assignment expression first the right hand side is evaluated into an R-Value. In this case that R-Value is computed by looking up the R-Value of the variable i in the symbol table and adding 1 to it, so in this case a new value of 1. That value is boxed into an NSNumber and put in a new R-Value. Now the actual assignment occurs. Here the evaluator looks up the variable in the symbol table, this time interested in its L-Value and the R-Value that the L-Value refers to. The boxed value of the R-Value computed from the right hand side is used to replace the boxed value of the R-Value to the L-Value refers to. This has the desired effect of changing the value of the variable i. The next time an expression includes this variable, the variable's L-Value refers to an R-Value that contains a boxed NSNumber object with a primitive value of 1. It's not as complicated as this description probable makes it seem.

Tuesday, April 13, 2010

DeadEnds Interpreter, Memory Management

The DeadEnds interpreter is written in Objective-C using Apple's Foundation framework, part of the Cocoa platform. Two dynamic memory management models are supported in this environment, retain/release and garbage collection. I opted to use the retain/release model because it forces a more careful analysis of dynamic memory usage. In the retain/release model each allocated object has a retain count associated with it. As long as the retain count is greater than zero the object remains in existence. When the retain count drops to zero the system will reclaim the object. Any software that needs to maintain access to an object should increment its retain count (though see description of weak references below), and is responsible for decrementing (releasing) the retain count when access is no longer required. Simple rules are used for managing retain/release objects:

Allocation methods (alloc) and copy methods (name starts with copy) create objects that come into existence with a retain count of one. Code that calls these methods are responsible for releasing the object.
If software creates an object on behalf of a caller, it calls autorelease on the object before returning the object. This causes a release to occur in the future, giving the caller a chance to retain the object before it disappears.

When one object refers to a second object, and the first object retains a reference to the second object, the reference is called a strong reference. This is the usual convention. When the first object is released, it is responsible for first releasing its strongly referenced objects. If an object refers to a second object, but does not retain the reference to second object, the reference is called a weak reference. Weak references are used in situations where there are specific conventions in use for memory management that can assure that the second object will not be released while the first object still requires access to it. As we will see below, this convention is used by one of the object sets employed by the DeadEnds interpreter.

During interpretation there are two sets of objects created. The first are the semantic values that make up the internal format of the program modules. These are the objects created by the parser when reading the program files. Object creation occurs within semantic actions, snippets of Objective-C code that runs whenever a grammar rule is recognized and reduced. These objects are allocated and initialized by the semantic actions so they have a retain count of one. As parsing proceeds trees of these semantic objects are built up and ultimately each program module is represented by a single semantic object (a TWInterpModule) that has a tree of semantic objects below it representing the body of the module. Because each semantic object is created by the parser, and because there is no need for any other object to retain these objects, the references between all semantic objects and the objects they refer to are weak references. Therefore the initializer methods for semantic objects never retain references to other semantic objects. In fact, in the current version of the interpreter, once a module has been parse and converted to semantic tree form it is never released.

The second set of objects are the values created by evaluating expressions while methods are being actively interpreted (executed/run). There are a number of expression classes, all sub-classes of TWInterpExpression. Each of these classes implements an evaluate: method that takes a symbol table (TWInterpSymbolTable) as an argument. Each evaluate: methods returns an r-value (TWInterpRValue) object. The memory management rule used in this case is r-values returned by evaluate: methods are already retained with a count of one. So evaluate: methods use the same rules as alloc and copy methods do: it is the responsibility of the callers of evaluate: methods (which are often evaluate: methods higher in a recursive evaluation of a complex expression) to release all TWInterpRValue objects it receives from the evaluate: methods it calls.

To give an example of this second approach in action, here is psuedo code for the evaluate: method for a binary expression.

Call evaluate: on the left sub-expression resulting in a retained left r-value object.
Call evaluate: on the right sub-expression resulting in a retained right r-value object.
Create a new r-value (using alloc) with value computed from the values of the two just computed r-values. This new r-value has a retain count of one.
Release the two r-values computed in steps 1 and 2.
Return the resultant r-value computed in step 3.

The details are more complex that indicated here, but this gives a good introduction to the convention used.

Thursday, April 8, 2010

Objective-C Gedcom Library, Part One

As a precursor to writing DeadEnds software I implemented an Objective-C library that handles Gedcom. I have used this library to build a Gedcom validation program and plan to use it to build a replacement for the LifeLines program. In this post I'll describe parts of the Gedcom library.

TWGedcomSource and TWGedcomFile

TWGedcomSource is an abstract class representing a source of Gedcom records and and the errors encountered while processing them. TWGedcomFile is a sub-class of TWGedcomSource that uses a text file as a Gedcom source. It has an initializer method that reads a file containing Gedcom data and creates the TWGedcomRecordSet declared in TWGedcomSource. If any errors are encountered they are added to the TWGedcomErrors object.

@interface TWGedcomSource : NSObject

{

TWGedcomRecordSet* recordSet;

TWGedcomErrors* errors;

}

@interface TWGedcomFile : TWGedcomSource

- (id) initWithContentsOfFile: (NSString*) path;

@end

TWGedcomFile's initWithContentsOfFile: method is the method that reads a Gedcom file into sets of Gedcom records. A Gedcom file can be in a number of different character set encodings (e.g, ASCII, ANSEL, UTF-8), and this initializer method calls a number of other methods to help determine that set. After determining the character set, the initializer reads the file into an NSString using that encoding. Since ANSEL is a character set that is not supported by NSString, code in this class converts ANSEL to Unicode when needed. At the end of initialization, if all went well, the recordSet member variable, defined in TWGedcomSource, is initialized to the set of Gedcom records found in the file. If there were errors, the errors member variable will hold the list of errors encountered.

TWGedcomRecordSet

A TWGedcomRecordSet contains sets of Gedcom records, one for each of the officially defined (Gedcom 5.5) record types, one for any other custom, records that are found in the source, and one for records that were found to have errors too serious to be put in one of the other records sets. The records are created by calling the class's initializer method, initWithString:errors:, which takes an NSString containing Gedcom data, and converts that string into the sets of Gedcom records. The records are indexed by their cross reference keys. The records are also validated for a wide variety of error conditions, and any errors found are recorded in the TWGedcomErrors object passed to the initializer. The TWGedcomRecordSet initializer is called by the TWGedcomFile initializer after that initializer has analyzed the character set of the original file and read it into an NSString, which is always in Unicode format.

- (id) initWithString: (NSString*) string errors: (TWGedcomErrors*) errors;

This is the initializer that converts an NSString into sets of Gedcom records and validates those records for errors.

- (TWGedcomPersonRecord*) personWithKey: (NSString*) key;

- (TWGedcomFamilyRecord*) familyWithKey: (NSString*) key;

- (TWGedcomSourceRecord*) sourceWithKey: (NSString*) key;

These three methods retrieve the records with the given key.

- (NSArray*) personsWithName: (NSString*) name;

- (NSArray*) personsWithNameKey: (NSString*) nameKey;

These two methods retrieve the set of persons who match a given name or Soundex name key. Person records are indexed by name as well as by key, and the indexing is done based on Soundex. When using the method personsWithName:, the name string argument can be loosely formatted. There are two requirements on the string; first, that the Soundex code of the person can be computed from it; and two, that the characters found in the string are a subset of the characters in the actual name, and appear in the same order as in the name. Both of these methods return an NSArray of all person records that have NAME lines matching the name argument.

- (TWGedcomPersonRecord*) fatherOfPerson: (TWGedcomPersonRecord*) person;

- (TWGedcomPersonRecord*) motherOfPerson: (TWGedcomPersonRecord*) person;

- (NSArray*) childrenOfPerson: (TWGedcomPersonRecord*) person;

- (TWGedcomFamilyRecord*) natalFamilyOfPerson: (TWGedcomPersonRecord*) person;

- (NSArray*) natalFamiliesOfPerson: (TWGedcomPersonRecord*) person;

- (NSArray*) spousalFamiliesOfPerson: (TWGedcomPersonRecord*) person;

- (NSArray*) spousesOfPerson: (TWGedcomPersonRecord*) person;

These methods retrieve persons related to a given person, or families that the given person belongs to in some role.

- (TWGedcomPersonRecord*) husbandOfFamily: (TWGedcomFamilyRecord*) family;

- (TWGedcomPersonRecord*) wifeOfFamily: (TWGedcomFamilyRecord*) family;

- (NSArray*) childrenOfFamily: (TWGedcomFamilyRecord*) family;

- (NSArray*) husbandsOfFamily: (TWGedcomFamilyRecord*) family;

- (NSArray*) wivesOfFamily: (TWGedcomFamilyRecord*) family;

These methods return persons who play the indicated roles in families.

TWGedcomRecord

The TWGedcomRecord class is both the superclass for record classes that have distinct behavior and content (person, family and source records), and the concrete class for the other record types. This is perhaps unconventional and may change.

The class has a key class method:

+ (NSArray*) gedcomRecordsFromStrings: (NSArray*) strings maxCount: (NSInteger) maxCount errors: (TWGedcomErrors*) errors;

This method reads an array of NSStrings, each containing a Gedcom line, and converts the strings into an array of TWGedcomRecords (using the subclass types for persons, families and sources). This method does the character-level parsing of the Gedcom data, converting it first into trees of TWGedcomNodes and then creating a TWGedcomRecord for each tree. When done, the method returns the set Gedcom records as an NSArray of TWGedcomRecords. This method adds any errors encountered to the TWGedcomErrors object. This class method is called by the TWGedcomRecordSet initializer. Once that initializer gets the array of records, it places each record in its specific subset and indexes it. The TWGedcomRecordSet initializer also does more extensive checks on the set of records as a whole, which may add more errors to the TWGedcomErrors object.

TWGedcomPersonRecord, TWGedcomFamilyRecord, TWGedcomSourceRecord

Three of the Gedcom record types have their own classes in the Objective-C library, these being the person (INDI), family (FAM) and source (SOUR) records. This allows them to have methods specific to their own type.

TWGedcomNode

A TWGedcomNode holds the data of a single line of Gedcom. A TWGedcomNode is a subclass of the TWNode class, which is the class that DeadEnds software uses to hold XML-based data. Because of this, Gedcom data within a DeadEnds program can be treated as if it were XML-based, including being written or transmitted in XML format.

Wednesday, March 31, 2010

The DeadEnds Interpreter, Part Three

This (rather long) post describes the main Objective-C classes used in the implementation of the DeadEnds programming language interpreter. When program files are parsed the interpreter builds up a semantic representation of the modules (functions and procedures) defined in the program. This representation is a tree of objects; most of the objects are instances of different Expression and Statement classes implemented in Objective-C.

There are three main classes used to build the internal representations of programs -- Module, Expression and Statement. The sections that follow describe these three classes and many of their sub-classes.

Modules

Each function or procedure parsed from a DeadEnds program results in a Module object being created that is added to a table of Modules. As execution of the program proceeds, expressions in the module begin interpreted may call other modules. The Objective-C interface for the Module class is:

@interface Module : NSObject

{

InterpType type;

NSString* name;

NSArray* parameters;

BlockStatement* body;

}

- (id) initWithType: (NSNumber*) type name: (NSString*) name parameters: (NSArray*) parameters block: (BlockStatement*) block;

- (ReturnCode) interpretArguments: (NSArray*) arguments symbolTable: (SymbolTable*) symbolTable;

@end

A Module has a return type, a name, a list of parameters and a BlockStatement. During execution Modules are called by CallExpressions. When a CallExpression is evaluated, the name in the Expression is used to lookup the Module object in the Module table. The argument Expressions in the CallExpression are evaluated and assigned to the Module's parameters and used to initialize the symbol table that provides the context for interpreting the Module. The Module's BlockStatement is then interpreted.

The important method of the Module class is the interpArguments:symbolTable method that interprets the Module. The method requires two arguments, the list of argument Expressions to pass to the Module, and the SymbolTable at the base of the active SymbolTable chain that provides the execution context.

Expressions

Expressions are semantic objects created during parsing and they make up a substantial portion of the semantic trees. Generally each expression in the programs will be represented by an Expression object in the semantic representation. There is one abstract class that encompass all Expression classes, another abstract sub-class for numeric constants, and a number of concrete Expression sub-classes for the different types of expressions found in the programming language. The simplified Objective-C interface to the Expression classes is:

@interface Expression : NSObject

- (RValue*) evaluate: (SymbolTable*) symbolTable;

@end

The Expression class is the base of all Expression sub-classes. It handles line numbers (part of error handling, not shown), and defines the primary method that all Expression sub-classes must implement. This is the evaluate: method, that evaluates an Expression in the context of a chain of SymbolTables. Every evaluate method returns an RValue object, an object that wraps one of the 13 (very likely more later) types supported by the language. Evaluation of Expressions does not modify the Expressions; Expressions are semantic objects that are part of the internal semantic representation of Modules. Evaluation of an Expression creates a dynamic RValue object that represents the run-time value of the Expression in the context of the current programming state, which is encoded in the states of the accessible SymbolTables.

@interface Constant : Expression

{

NSNumber* number;

}

- (id) initWithNumber: (NSNumber*) number;

@end

The Constant class is the base of all numeric constants. A Constant is a container for an NSNumber, a foundation object that can hold any kind of basic numeric value.

@interface IntegerConstant : Constant

@end

@interface FloatConstant : Constant

@end

@interface BooleanConstant : Constant

@end

@interface CharacterConstant : Constant

@end

@interface StringConstant : Expression

{

NSString* string;

}

- (id) initWithString: (NSString*) string;

@end

A StringConstant holds a single string as an NSString, another foundation object. Evaluation of the numeric Constants and StringConstants consists of simply returning a copy of their values wrapped in RValue objects.

@interface Variable : Expression

{

NSString* name;

}

- (id) initWithName: (NSString*) name;

@end

A Variable object holds the name of a program variable. Evaluating a Variable object consists of looking the variable's name up in the current set of SymbolTables and returning the variable's current RValue. As we will see later SymbolTables map variable names to their LValues, and LValues refer to their corresponding RValues. There should be a little more written somewhere to discuss the differences between LValues and RValues and how the concepts are during evaluation and interpretation.

@interface BinaryExpression : Expression

{

Expression* leftExpression;

Expression* rightExpression;

Operator operator;

}

- (id) initLeft: (Expression*) left right: (Expression*) right operator: (NSNumber*) operator;

- (RValue*) evaluateDotExpression: (SymbolTable*) symbolTable;

- (RValue*) evaluateAssignExpression: (SymbolTable*) symbolTable;

- (RValue*) evaluateAndOrExpression: (SymbolTable*) symbolTable;

- (RValue*) evaluateRelationalExpression: (SymbolTable*) symbolTable;

- (RValue*) evaluateArithmeticExpression: (SymbolTable*) symbolTable;

@end

A BinaryExpression holds an expressions with left side and right side sub-Expressions and an operator. Evaluating a BinaryExpression involves recursively evaluating the two sub-expressions and then using the operator to compute the final RValue. Five methods are used for the five major types of BinaryExpressions. Dot-expressions implement method calls and structure field access; assign-expressions are assignment statements; and-or-expressions are logical expressions (their evaluation uses the usual short-circuiting rules which may avoid evaluation the right sub-expression); relational-expressions compare the sub-expressions; and arithmetic expressions perform arithmetic operations. Evaluation of BinaryExpressions may involve implicit type coercion rules.

@interface UnaryExpression : Expression

{

Expression* expression;

Operator operator;

}

- (id) initExpression: (Expression*) expression operator: (NSNumber*) operator;

@end

UnaryExpressions holds unary expressions, those with an operator (+, -, !) and a sub-Expression. Evaluating a UnaryExpression involves recursively evaluating the sub-Expression and then applying the operator to that value.

@interface CallExpression : Expression

{

NSString* name;

NSArray* arguments;

}

- (id) initWithName: (NSString*) name arguments: (NSArray*) arguments;

@end

CallExpressions are function and procedure calls (Method calls are handled by BinaryExpressions). Each CallExpression consists of the name of the Module to call and a list of arguments to pass in, where each is an Expression in its own right. Evaluation of a CallExpression involves looking up the name in a table to get a Module object, evaluating the arguments and binding them to the Module's parameters in a new symbol table, and then interpreting the Module's BlockStatement. If the BlockStatement terminates with a ReturnStatement with an Expression, the ReturnStatement evaluates that Expression into an RValue, and that RValue becomes the final value of the CallExpression. Not as complicated as it sounds.

Statements

Statements are semantic objects created during parsing and, along with Expressions, are the other main object found in semantic trees. Each language statement parsed from a program is represented as a Statement object in a semantic tree. Statements are recursive because many Statements (e.g., if-statements, while-statements, block statements) contain other statements. The simplified Objective-C interface for the Statement classes is:

@interface Statement : NSObject

- (ReturnCode) interpret: (SymbolTable*) symbolTable;

@end

The Statement class is the abstract base class for all Statement sub-classes. It handles line numbers (part of error handling, not shown) and defines the key method that all Statement sub-classes must implement. That is the interpret: method, that interprets (runs, executes) a Statement in the context of a set of symbol tables. Every interpret: method returns a return code used to direct follow-on actions. Normally the return code indicates the normal case and interpretation continues to the next Statement in the program. Describing the other return codes is a little advanced for this introduction.

@interface ExpressionStatement: Statement

{

Expression* expression;

}

- (id) initWithExpression: (Expression*) expression;

@end

An ExpressionStatement is syntactically an Expression followed by a semicolon. For example, an assignment statement is an ExpressionStatement where the Expression is a BinaryExpression with '=' as its operator. Interpretation of an ExpressionStatement consists of evaluating the Expression and accumulating whatever side effects that it causes (e.g., the change of a value in a symbol table).

@interface IfStatement: Statement

{

Expression* conditional;

Statement* thenStatement;

Statement* elseStatement;

}

- (id) initWithConditional: (Expression*) conditional thenStatement: (Statement*) thenStatement elseStatement: (Statement*) elseStatement;

@end

IfStatements implement conventional if-then-else Statements where the else Statement is optional. Interpretation of an IfStatement begins by evaluating the conditional Expression that is coerced to a Boolean value. If that value is true the then Statement is recursively interpreted; if that value is false and the else Statement exists, the else Statement is recursively interpreted.

@interface WhileStatement: Statement

{

Expression* conditional;

Statement* statement;

}

- (id) initWithConditional: (Expression*) conditional statement: (Statement*) statement;

@end

WhileStatements implement conventional while statement semantics. Interpretation of a WhileStatement begins by evaluating the conditional Expression and coercing its value to a Boolean. If the value is true interpretation continues with the interpretation of the body Statement; after the Statement is interpreted the Expression is reevaluated in the new context and the Statement will be interpreted again if the Expression's value is still true. This continues until the value of the Expression is false, at which point interpretation of the WhileStatement ends. Note that ReturnStatements, ContinueStatements and BreakStatements found within the body Statement implement the conventional rules for such statements.

@interface DoStatement: Statement

{

Statement* statement;

Expression* conditional;

}

- (id) initWithStatement: (Statement*) statement conditional: (Expression*) conditional;

@end

DoStatements implement conventional do-while statement semantics. Interpretation of DoStatements is similar to that of WhileStatements except that the order of evaluating the conditional Expression and interpreting the body Statement are different. As in WhileStatements, any ReturnStatement, ContinueStatement or BreakStatement found in the body Statement is handled in the conventional manner.

@interface BlockStatement : Statement

{

NSArray* statements;

}

- (id) initWithStatements: (NSArray*) statements;

@end

A BlockStatement is simply an array of Statements. BlockStatements are interpreted by interpreting each of its sub-Statements, from first to last, in sequence.

@interface Declaration : Statement

{

InterpType type;

NSArray* initializers;

}

- (id) initWithType: (NSNumber*) type initializers: (NSArray*) initializers;

@end

Declarations are Statements that declare new Variables and optionally assign them their initial values. In the DeadEnds programming language Declarations are treated exactly as other Statements and can appear anywhere any other Statement can occur. Interpretation of a Declaration causes the Variables to be added to the SymbolTable associated with the BlockStatement the Declarations in (or to the global SymbolTable if the Declaration is outside of any Module definition). If a Variable has an initializing Expression, the Expression is evaluated normally, that is, in the context of the current SymbolTable, and the new Variable is assigned given that initial RValue in the SymbolTable. Initializing Expressions can refer to Variables that were declared earlier in the same BlockStatement. This is exactly what one would hope would happen.

@interface ReturnStatement: Statement

{

Expression* returnExpression;

}

- (id) initWithExpression: (Expression*) expression;

@end

ReturnStatements represent return statements in DeadEnds programs. When a ReturnStatement is interpreted control returns from the current Module to the CallExpression in the Module that called it. If the ReturnStatement has an Expression (it is optional), the Expression is first evaluated in the context of the current set of SymbolTables, and the resulting RValue is returned to the calling Module as the value of the CallExpression.

@interface ContinueStatement: Statement

@end

What you'd expect, the semantic object representing a continue statement in a DeadEnds program. Interpreting a ContinueStatement causes interpretation control to move to the next iteration of the WhileStatement or DoStatement the ContinueStatement is embedded in.

@interface BreakStatement: Statement

@end

What you'd expect, the semantic object representing a break statement in a DeadEnds program. Interpreting a BreakStatement causes interpretation control to break out of the current control structure using the conventional rules for break statements.

Values

Values are values created at run-time when program Modules are being interpreted. Values are always the result of evaluating Expressions within the context of a chain of symbol tables. There are two types of values, RValues and LValues, that are described in more detail below. The simplified Objective-C interface to the two value classes is:

@interface RValue : NSObject

{

InterpType type;

id value;

BOOL hasLValue;

}

- (id) initWithType: (InterpType) type value: (id) value isLValue: (BOOL) isLValue;

- (id) initWithInterpValue: (RValue*) interpValue;

- (BOOL) boolValue;

- (RValue*) coerceType: (InterpType) toType;

@end

RValue objects are computed by the interpreter at run-time by evaluating Expressions. An RValue can hold any one of the many (currently 15) valur types supported by the DeadEnds programming language (Boolean, Character, Integer, Float, String, List, Set, Table, Node, Record, Person, Family, Void, Any and Error). The type field indicates the specific type, and the value field holds a foundation object (usually an NSNumber or NSString) that holds the actual value. The hasLValue field is true if the RValue is referenced by an LValue, which generally means it is the value of a Variable. The RValue class has two methods to initialize an RValue, a method to return the coercion of the value to a Boolean value, and the coerceType: method that attempts to coerce the RValue from one type to another. If the coercion is not possible the method return an RValue with the Error type.

@interface LValue : NSObject

{

RValue* rValue;

}

- (id) initWithRValue: (RValue*) anRValue;

@end

LValues are objects that hold references to RValues. LValues are used in SymbolTables. A SymbolTable is a map (implemented by an NSMutableDictionary, another foundation class) from the names of Variables to their LValues. In a normal compiler an l-value is the actual location in memory where an r-value is located. In the object-oriented DeadEnds interpreter, an LValue becomes a reference (effectively a location) to another object, the RValue that holds the Variable's actual value. When an assignment BinaryExpression is evaluated, the RValue of the Variable on the left side of the assignment operator is changed to hold the RValue computed by evaluating the Expression on the right side of the operator. Since the Variable's RValue is being pointed to by the its LValue, this change to the RValue effectively changes the value of the Variable itself through the LValue mechanism. Note that the evaluation logic does not need to know if an RValue is the target to an LValue or not. The only reason why the evaluation logic cares about the existence of LValues is for issuing warnings if an assignment statement would fail to change the program's state because the left hand side is not a Variable or a component of a structured Variable. Again, it's not nearly as complicated as it sounds.

Symbol Tables

SymbolTables are run-time objects that map Variable names to their LValues and the LValues refer to their RValues. At any given time there is a chain of active SymbolTables. At the top of the chain is the global SymbolTable that holds any global Variables that were defined outside of any Module. At the bottom of the chain is the SymbolTable holding the Variables declared in the deepest BlockStatement currently being interpreted. Between the top SymbolTable and the bottom SymbolTables are intermediate SymbolTables holding the Variables defined in any open and pending BlockStatements between the current location in the program and the BlockStatement at the Module level. Note that if a BlockStatement does not have any DeclarationStatements the interpreter does not create a SymbolTable for that BlockStatement. When looking up a Variable in a SymbolTable, the SymbolTable at the bottom of the chain is searched first. If the Variable cannot be found in the bottom SymbolTables, the SymbolTables are searched one by one, up the chain, until the global SymbolTable is reached and searched. If the Variable cannot be found after all SymbolTables have been checked the Variable is undefined and an Error object is returned.

@interface SymbolTable : NSObject

{

NSMutableDictionary* nameValueTable;

SymbolTable* parent;

}

- (id) initWithParent: (SymbolTable*) parent;

- (BOOL) contains: (NSString*) name;

- (LValue*) valueOfName: (NSString*) name;

- (void) addDeclaration: (Declaration*) declaraction;

- (void) setValue: (LValue*) lValue ofName: (NSString*) name;

- (RValue*) lookup: (NSString*) name;

@end

DeadEnds Genealogical Software

Thursday, April 15, 2010

DeadEnds Interpreter, Expression Values

Tuesday, April 13, 2010

DeadEnds Interpreter, Memory Management

Thursday, April 8, 2010

Objective-C Gedcom Library, Part One

Wednesday, March 31, 2010

The DeadEnds Interpreter, Part Three

Deader Ends

About Me