Monday, March 29, 2010

The DeadEnds Interpreter, Part One

During March, 2010, I designed the DeadEnds programming language and wrote most of the interpreter that executes programs in the language. The language lets DeadEnds users write their own programs that access records in DeadEnds databases. The language has a full set of types and operations for general purpose computing, and specialized types and operations for handling common genealogical entities such as persons, events and families.

The DeadEnds programming language is akin to the report interpreter language I wrote for LifeLines, but has a more conventional syntax. It uses declarations, statements and module definitions similar to those in C and Java. The LifeLines language was originally intended for describing genealogical reports so users could generate custom and precisely formatted reports. Many existing LifeLines programs do just this, generating complex HTML, postscript, and other text-based outputs that are used to generate reports. But it also became clear that the LifeLines programming language could be used for many other purposes, and it evolved quickly into a general purpose programming language with a number of data types making processing of genealogical data particularly simple. The DeadEnds language begins with the functionality of the LifeLines language and extends it in various directions.

Data Types
The DeadEnds programming language supports these data types:
  • Boolean - with the constants true and false.
  • Character - unicode characters.
  • Integer - signed integers.
  • Float - double-precision floating-point numbers.
  • String - strings of unicode characters.
  • List - mutable list of values that provide stack, queue and array access.
  • Set - mutable set of values.
  • Table - mutable table (map, dictionary) that maps strings to values.
  • Node - nodes in a genealogical (and other) records (e.g., lines in a Gedcom record or elements in an XML record).
  • Record - a record in a database, where the contents of the record can be treated as a tree of Nodes.
  • Any - any of the above.
  • Void - type with no values.
  • Error - used when run-time evaluation is impossible.
The language does not support user defined types though this feature could be provided if it seems useful.

Declarations as Statements
The original syntax for declarations had the form:

Integer a, b, c;

a type followed by a list of variables. The variables were added to the symbol table with default values. Declarations were not treated like statements and declarations had to precede statements in program blocks.

I have relaxed these restrictions in two ways. First, declarations can have initializers, expressions that give them initial values; for example:

Integer a = 1, b = 2, c = a + b;

Initializers are treated like any other expressions so are evaluated at run time. They do not have to be constants. Because they are processed left to right, the initializer for c in the example above can use the values of a and b and will have an initial value of 3. Initializers can also be used to initialize structured values, for example:

Set s = { "one", "two", "three"};

initializes Set s with three string elements.

I also removed the restriction that declarations had to occur before statements. They are now treated as any other statement type and can occur anywhere in a block of statements. They are interpreted as assignment statements that have the side effect of adding an entries to the block's symbol table.

No comments:

Post a Comment