Thursday, April 8, 2010

Objective-C Gedcom Library, Part One

As a precursor to writing DeadEnds software I implemented an Objective-C library that handles Gedcom. I have used this library to build a Gedcom validation program and plan to use it to build a replacement for the LifeLines program. In this post I'll describe parts of the Gedcom library.

TWGedcomSource and TWGedcomFile
TWGedcomSource is an abstract class representing a source of Gedcom records and and the errors encountered while processing them. TWGedcomFile is a sub-class of TWGedcomSource that uses a text file as a Gedcom source. It has an initializer method that reads a file containing Gedcom data and creates the TWGedcomRecordSet declared in TWGedcomSource. If any errors are encountered they are added to the TWGedcomErrors object.

@interface TWGedcomSource : NSObject

{

TWGedcomRecordSet* recordSet;

TWGedcomErrors* errors;

}


@interface TWGedcomFile : TWGedcomSource

- (id) initWithContentsOfFile: (NSString*) path;

@end


TWGedcomFile's initWithContentsOfFile: method is the method that reads a Gedcom file into sets of Gedcom records. A Gedcom file can be in a number of different character set encodings (e.g, ASCII, ANSEL, UTF-8), and this initializer method calls a number of other methods to help determine that set. After determining the character set, the initializer reads the file into an NSString using that encoding. Since ANSEL is a character set that is not supported by NSString, code in this class converts ANSEL to Unicode when needed. At the end of initialization, if all went well, the recordSet member variable, defined in TWGedcomSource, is initialized to the set of Gedcom records found in the file. If there were errors, the errors member variable will hold the list of errors encountered.


TWGedcomRecordSet

A TWGedcomRecordSet contains sets of Gedcom records, one for each of the officially defined (Gedcom 5.5) record types, one for any other custom, records that are found in the source, and one for records that were found to have errors too serious to be put in one of the other records sets. The records are created by calling the class's initializer method, initWithString:errors:, which takes an NSString containing Gedcom data, and converts that string into the sets of Gedcom records. The records are indexed by their cross reference keys. The records are also validated for a wide variety of error conditions, and any errors found are recorded in the TWGedcomErrors object passed to the initializer. The TWGedcomRecordSet initializer is called by the TWGedcomFile initializer after that initializer has analyzed the character set of the original file and read it into an NSString, which is always in Unicode format.


- (id) initWithString: (NSString*) string errors: (TWGedcomErrors*) errors;


This is the initializer that converts an NSString into sets of Gedcom records and validates those records for errors.


- (TWGedcomPersonRecord*) personWithKey: (NSString*) key;

- (TWGedcomFamilyRecord*) familyWithKey: (NSString*) key;

- (TWGedcomSourceRecord*) sourceWithKey: (NSString*) key;


These three methods retrieve the records with the given key.


- (NSArray*) personsWithName: (NSString*) name;

- (NSArray*) personsWithNameKey: (NSString*) nameKey;


These two methods retrieve the set of persons who match a given name or Soundex name key. Person records are indexed by name as well as by key, and the indexing is done based on Soundex. When using the method personsWithName:, the name string argument can be loosely formatted. There are two requirements on the string; first, that the Soundex code of the person can be computed from it; and two, that the characters found in the string are a subset of the characters in the actual name, and appear in the same order as in the name. Both of these methods return an NSArray of all person records that have NAME lines matching the name argument.


- (TWGedcomPersonRecord*) fatherOfPerson: (TWGedcomPersonRecord*) person;

- (TWGedcomPersonRecord*) motherOfPerson: (TWGedcomPersonRecord*) person;

- (NSArray*) childrenOfPerson: (TWGedcomPersonRecord*) person;

- (TWGedcomFamilyRecord*) natalFamilyOfPerson: (TWGedcomPersonRecord*) person;

- (NSArray*) natalFamiliesOfPerson: (TWGedcomPersonRecord*) person;

- (NSArray*) spousalFamiliesOfPerson: (TWGedcomPersonRecord*) person;

- (NSArray*) spousesOfPerson: (TWGedcomPersonRecord*) person;


These methods retrieve persons related to a given person, or families that the given person belongs to in some role.


- (TWGedcomPersonRecord*) husbandOfFamily: (TWGedcomFamilyRecord*) family;

- (TWGedcomPersonRecord*) wifeOfFamily: (TWGedcomFamilyRecord*) family;

- (NSArray*) childrenOfFamily: (TWGedcomFamilyRecord*) family;

- (NSArray*) husbandsOfFamily: (TWGedcomFamilyRecord*) family;

- (NSArray*) wivesOfFamily: (TWGedcomFamilyRecord*) family;


These methods return persons who play the indicated roles in families.


TWGedcomRecord

The TWGedcomRecord class is both the superclass for record classes that have distinct behavior and content (person, family and source records), and the concrete class for the other record types. This is perhaps unconventional and may change.


The class has a key class method:


+ (NSArray*) gedcomRecordsFromStrings: (NSArray*) strings maxCount: (NSInteger) maxCount errors: (TWGedcomErrors*) errors;


This method reads an array of NSStrings, each containing a Gedcom line, and converts the strings into an array of TWGedcomRecords (using the subclass types for persons, families and sources). This method does the character-level parsing of the Gedcom data, converting it first into trees of TWGedcomNodes and then creating a TWGedcomRecord for each tree. When done, the method returns the set Gedcom records as an NSArray of TWGedcomRecords. This method adds any errors encountered to the TWGedcomErrors object. This class method is called by the TWGedcomRecordSet initializer. Once that initializer gets the array of records, it places each record in its specific subset and indexes it. The TWGedcomRecordSet initializer also does more extensive checks on the set of records as a whole, which may add more errors to the TWGedcomErrors object.


TWGedcomPersonRecord, TWGedcomFamilyRecord, TWGedcomSourceRecord

Three of the Gedcom record types have their own classes in the Objective-C library, these being the person (INDI), family (FAM) and source (SOUR) records. This allows them to have methods specific to their own type.


TWGedcomNode

A TWGedcomNode holds the data of a single line of Gedcom. A TWGedcomNode is a subclass of the TWNode class, which is the class that DeadEnds software uses to hold XML-based data. Because of this, Gedcom data within a DeadEnds program can be treated as if it were XML-based, including being written or transmitted in XML format.



No comments:

Post a Comment