mjc is the compiler for the MultiJava project.

This document is a work in progress and currently provides a place to collect things that should be added to the documentation. Some of these items are just guesses at this stage as I'm developing a deeper understanding of the Kopi architecture.

Document overall architecture:

Basically the classes can be divided into four groups: classes related to parsing, AST classes, classes used for code generation, and utility classes. Instances of the AST classes are the output of the parser; instances of the code generation classes are the output of walking the ASTs. The ASTs are "self-walking" for the most part, by which I mean the methods for performing an operation on the tree are coded into the node classes. An accept method is provided, but the visitor pattern seems to be used only for unparsing the tree (for debugging purposes).

The JPhylum hierarchy contains all classes representing AST nodes.

The CType hierarchy represents all the types in Java.

The CContext hierarchy is used for control flow analysis and variable scoping during typechecking.

Classes of the CMember hierarchy represent the signatures of classes, interfaces, fields, and methods.

Compilation Passes:

PARSING

The ANTLR translator-generator tool is used for lexing and parsing. Classes related to parsing include:

TYPECHECKING

The three checking passes in the compilation sequence can be understood in terms of the JVM execution sequence [JLS 12.]. The interface checking pass is analogous to loading and linking [JLS 12.2, 12.3]. The initializers checking pass is analogous to initialization of classes and interfaces [JLS 12.4]. Finally the pass for checking bodies is analogous to creation of instances and execution. [12.5].

CHECK INTERFACES

The primary purpose of this path is to gather information about type signatures so that the typecheck pass can do things like finding the generic function invoked by a method call. Checks the basic interfaces to make sure things generally look OK. Moves augmenting methods into local classes if appropriate. This pass gathers information about the type signatures of everything (imported class files, classes being compiled, methods, fields, etc...) needed for the later passes. This information is stored in a CCompilationUnit instance and instances of CMember that are bound to the AST. Also adds things like the default constructor and the initializer method to the AST.

CHECK INITIALIZERS

Checks the initializer methods that were creating during the interface checking pass. The checks performed are like those in the interface checking pass. Additionally full blown typechecking is done on the static initializers using a context that, appropriately, does not contain information for the typechecking of instance members.

TYPECHECK INSTANCE MEMBERS

Typechecks the code. A series of context classes (subclasses of CContext) are passed through the various typecheck methods of the AST nodes. The typechecking generally follows the control flow of the source code being compiled. The contexts are mutated to collected the information gathered during typechecking. For example, when a variable declaration is typechecked the context for the scope of that variable is mutated to include the type information for the variable. Once typechecking of that scope is complete the context for that scope is merged with the surrounding context and the information from that variable is "popped".

In the typecheck pass we treat external methods as belonging to the classes they augment and handle multiple dispatch as if it was part of the target language. In other words, the typecheck pass mimics the semantics of MultiJava not the implementation.

GENERATE CODE

to be completed

Random scraps of documentation that need a new home

Handling Open Classes

Open members are imported into a compilation unit using a mechanism similar to Java's import mechanism. In the current version of the compiler a separate keyword include is used to import compilation units declaring open members. This is an implementation convenience and we expect to use the Java import keyword in the future.

Like with Java's import mechanism, when a compilation unit U includes an external generic function M this effectively alters the search space for resolving identifiers in U's source code. Thus the identifier M appearing in U may refer to a member declared in the included generic function. The following functionality is not yet implemented. If U declares a class C that is augmented by M, then the members declared in M are considered members of C. That is, a (direct or transitive) subclass D of C inherits the members declared in M even if D does not include M. On the other hand, if U declares a class S that is not augmented by M and does not subclass (either directly or transitively) any class augmented by M, then subclasses of S cannot reference members of M without including M.

For typechecking method calls we view the receiver classes as if they actually contain the augmenting methods. See CAugmentationMap, CGFCollectionMap, CGenericFunctionCollection, CMethodSet, and CClass for more information on this process.

For typechecking external method bodies we group the external methods into anchor classes, represented by MJGenericFunctionDecl instances. When typechecking the individual external methods in an anchor class we create a special context instance of type CExtMethodContext that facilitates the typechecking and code generation for references to this within the external method. I.e., this appearing in an external method refers to the receiver class. This context and the anchor class also provide the outer class for inner classes declared by type declaration statements or anonymous inner class declarations within the external method.

Implicit Import of External Generic Functions

Implicit import is currently disabled for ease of implementation.

One of the challenges in handling open classes is the need to implicitly import external generic functions. As with Java classes, implicit import is needed in two cases. External methods of the same package as the client are imported when referenced. External methods may also be implicitly imported from another package. This happens when two things are true: there is a package import statement in the client compilation unit and the client code references an external method defined in the imported package.

The key to implicit import is that a search for possible external methods must be performed whenever a method identifier is processed. There are two ways that a method identifier can be used, in a method declaration and in a method call. For a method declaration the new method may be specializing an implicitly imported external method. For a method call the method may be external. This last point is true even if an internal method of the same name and static argument types exists. In practice the implicitly imported external methods are recorded in a hashtable. This techique allows for efficient lookup of external methods; only the first reference requires disk accesses.

This implicit import algorithm ensures that the top method of every specialized generic function is found during the checkInterface pass. Thus at the end of the checkInterface pass we are able to group multimethods by generic function. This is discussed further below.

Privileged Access Modifiers and Augmenting Methods

The privileged access checking code for kjc was quite buggy. This code was refactored to fix the bugs and to handle MultiJava's open members. One change is the addition of a CMemberHost interface. This interface is implemented by CCompilationUnit and CClass, which represent the two container objects for member declarations. Additionally, CMember and its subclasses have an added field, host, that indicates the lexically-nearest container for the member's declaration. The owner field is used to represent the logical owner of the member, e.g., the CClass of an augmenting method's receiver.

For method accessibility checks, the compiler passes the host of the caller to the isAccessibleFrom predicate of the callee. The privileged access modifiers of the host, owner, and callee are all considered in this predicate. Similar checks are performed for the accessibility checks on other member references (e.g., fields and classes).

Handling Multimethods

This code is currently ripe for refactoring; treat this description accordingly.

We use the term dispatcher wrapping for the process of grouping multimethods by generic function. The term is motivated by the fact that all multimethods in a given context that extend the same generic function must be wrapped in a new Java method. This method performs dynamic dispatch to determine which of the available multimethods is appropriate for a given invocation. Dispatcher wrapping happens following the checkInitializers pass after all method signatures in all source files have been resolved.

Dispatcher wrapping must be performed in several cases. Any methods that use multiple dispatch (i.e., that have dynamic dispatch annotations) must be wrapped in a dispatcher method. A lone external method must also be wrapped in a dispatcher method. In this case the wrapping accomodates the implementation strategy of passing the receiver of a method call as an additional argument. Finally, a lone internal method of an external generic function must also be wrapped. This wrapping supports the implementation strategy for external generic functions where all specializing methods are part of a chain of responsibility [Gamma et al.].

The dispatcher wrapping algorithm uses two new AST node classes, MJGenericFunctionDispatcher and MJExtGenericFunctionDispatcher. The first is used to wrap methods of an internal generic function; these methods are necessarily internal. The MJExtGenericFunctionDispatcher class wraps methods of an external generic function. The wrapped methods may be internal or external, but all the methods in a single wrapper will be of the same sort. The compiler also uses two new CMethod subclasses, CSourceDispatcherMethod and CSourceExtDispatcherMethod. These subclasses store the information collected after typechecking that is needed for bytecode generation.

Handling Class and Generic Function Imports

How Kopi Handles Class Imports

CTopLevel is a collection of static methods that together act like a class loader. CTopLevel maintains a mapping from class names to CClass singletons. It uses a hashtable to record the classes loaded during the compilation process. CTopLevel includes methods for finding classes in the class path and for registering the signatures of classes being compiled in the current session.

CClassType and its subclasses represent the types of objects. These representatives contain references to the same CClass singletons stored in CTopLevel. The lookup() method of CClassType provides a mapping from type names to CClassNameType singletons. The CClassNameType singletons are associated with their CClass singletons on-demand during compilation. This association is triggered by the CClassNameType.checkType() method.

Extensions to Kopi to Handle Generic Function Imports

CTopLevel includes storage and static methods to track all external generic functions. The methods for manipulating and querying this data are analogous to those for the CClass singletons described above.