This document is a work in progress and currently provides a place to collect things that should be added to the documentation. Some of these items are just guesses at this stage as I'm developing a deeper understanding of the Kopi architecture.
Basically the classes can be divided into four groups: classes related to parsing, AST classes, classes used for code generation, and utility classes. Instances of the AST classes are the output of the parser; instances of the code generation classes are the output of walking the ASTs. The ASTs are "self-walking" for the most part, by which I mean the methods for performing an operation on the tree are coded into the node classes. An accept method is provided, but the visitor pattern seems to be used only for unparsing the tree (for debugging purposes).
The JPhylum hierarchy contains all classes representing AST nodes.
The CType hierarchy represents all the types in Java.
The CContext hierarchy is used for control flow analysis and variable scoping during typechecking.
Classes of the CMember hierarchy represent the signatures of classes, interfaces, fields, and methods.
The ANTLR translator-generator tool is used for lexing and parsing. Classes related to parsing include:
The three checking passes in the compilation sequence can be understood in terms of the JVM execution sequence [JLS 12.]. The interface checking pass is analogous to loading and linking [JLS 12.2, 12.3]. The initializers checking pass is analogous to initialization of classes and interfaces [JLS 12.4]. Finally the pass for checking bodies is analogous to creation of instances and execution. [12.5].
The primary purpose of this path is to gather information about type signatures so that the typecheck pass can do things like finding the generic function invoked by a method call. Checks the basic interfaces to make sure things generally look OK. Moves augmenting methods into local classes if appropriate. This pass gathers information about the type signatures of everything (imported class files, classes being compiled, methods, fields, etc...) needed for the later passes. This information is stored in a CCompilationUnit instance and instances of CMember that are bound to the AST. Also adds things like the default constructor and the initializer method to the AST.
Checks the initializer methods that were creating during the interface checking pass. The checks performed are like those in the interface checking pass. Additionally full blown typechecking is done on the static initializers using a context that, appropriately, does not contain information for the typechecking of instance members.
Typechecks the code. A series of context classes (subclasses of
CContext
) are passed through the various typecheck
methods of the AST nodes. The typechecking generally follows the
control flow of the source code being compiled. The contexts are
mutated to collected the information gathered during typechecking.
For example, when a variable declaration is typechecked the context
for the scope of that variable is mutated to include the type
information for the variable. Once typechecking of that scope is
complete the context for that scope is merged with the surrounding
context and the information from that variable is
"popped".
In the typecheck pass we treat external methods as belonging to the classes they augment and handle multiple dispatch as if it was part of the target language. In other words, the typecheck pass mimics the semantics of MultiJava not the implementation.
to be completed
Open members are imported into a compilation unit using a mechanism
similar to Java's import mechanism. In the current version of the
compiler a separate keyword include
is used to import
compilation units declaring open members. This is an implementation
convenience and we expect to use the Java import
keyword
in the future.
Like with Java's import mechanism, when a compilation unit
U includes an external generic function M this
effectively alters the search space for resolving identifiers in
U's source code. Thus the identifier M appearing in
U may refer to a member declared in the included generic
function. The following functionality is not yet
implemented. If U declares a class C that is
augmented by M, then the members declared in M are
considered members of C. That is, a (direct or transitive)
subclass D of C inherits the members declared in
M even if D does not include M. On the
other hand, if U declares a class S that is not
augmented by M and does not subclass (either directly or
transitively) any class augmented by M, then subclasses of
S cannot reference members of M without including
M.
For typechecking method calls we view the receiver classes as if
they actually contain the augmenting methods. See
CAugmentationMap
, CGFCollectionMap
,
CGenericFunctionCollection
, CMethodSet
, and
CClass
for more information on this process.
For typechecking external method bodies we group the external
methods into anchor classes, represented by
MJGenericFunctionDecl
instances. When typechecking the
individual external methods in an anchor class we create a special
context instance of type CExtMethodContext
that
facilitates the typechecking and code generation for references to
this
within the external method. I.e.,
this
appearing in an external method refers to the
receiver class. This context and the anchor class also provide the
outer class for inner classes declared by type declaration statements
or anonymous inner class declarations within the external method.
Implicit import is currently disabled for ease of implementation.
One of the challenges in handling open classes is the need to implicitly import external generic functions. As with Java classes, implicit import is needed in two cases. External methods of the same package as the client are imported when referenced. External methods may also be implicitly imported from another package. This happens when two things are true: there is a package import statement in the client compilation unit and the client code references an external method defined in the imported package.
The key to implicit import is that a search for possible external methods must be performed whenever a method identifier is processed. There are two ways that a method identifier can be used, in a method declaration and in a method call. For a method declaration the new method may be specializing an implicitly imported external method. For a method call the method may be external. This last point is true even if an internal method of the same name and static argument types exists. In practice the implicitly imported external methods are recorded in a hashtable. This techique allows for efficient lookup of external methods; only the first reference requires disk accesses.
This implicit import algorithm ensures that the top method of every specialized generic function is found during the checkInterface pass. Thus at the end of the checkInterface pass we are able to group multimethods by generic function. This is discussed further below.
The privileged access checking code for kjc was quite buggy. This
code was refactored to fix the bugs and to handle MultiJava's open
members. One change is the addition of a CMemberHost
interface. This interface is implemented by
CCompilationUnit
and CClass
, which represent
the two container objects for member declarations.
Additionally, CMember
and its subclasses have an added
field, host
, that indicates the lexically-nearest
container for the member's declaration. The owner
field
is used to represent the logical owner of the member, e.g.,
the CClass
of an augmenting method's receiver.
For method accessibility checks, the compiler passes the host of
the caller to the isAccessibleFrom
predicate of the
callee. The privileged access modifiers of the host, owner, and
callee are all considered in this predicate. Similar checks are
performed for the accessibility checks on other member references
(e.g., fields and classes).
This code is currently ripe for refactoring; treat this description accordingly.
We use the term dispatcher wrapping for the process of grouping multimethods by generic function. The term is motivated by the fact that all multimethods in a given context that extend the same generic function must be wrapped in a new Java method. This method performs dynamic dispatch to determine which of the available multimethods is appropriate for a given invocation. Dispatcher wrapping happens following the checkInitializers pass after all method signatures in all source files have been resolved.
Dispatcher wrapping must be performed in several cases. Any methods that use multiple dispatch (i.e., that have dynamic dispatch annotations) must be wrapped in a dispatcher method. A lone external method must also be wrapped in a dispatcher method. In this case the wrapping accomodates the implementation strategy of passing the receiver of a method call as an additional argument. Finally, a lone internal method of an external generic function must also be wrapped. This wrapping supports the implementation strategy for external generic functions where all specializing methods are part of a chain of responsibility [Gamma et al.].
The dispatcher wrapping algorithm uses two new AST node classes,
MJGenericFunctionDispatcher
and
MJExtGenericFunctionDispatcher
. The first is used to
wrap methods of an internal generic function; these methods are
necessarily internal. The MJExtGenericFunctionDispatcher
class wraps methods of an external generic function. The wrapped
methods may be internal or external, but all the methods in a single
wrapper will be of the same sort. The compiler also uses two new
CMethod
subclasses, CSourceDispatcherMethod
and
CSourceExtDispatcherMethod
. These subclasses store the
information collected after typechecking that is needed for bytecode
generation.
CTopLevel
is a collection of static methods that
together act like a class loader. CTopLevel
maintains a mapping from class names to CClass
singletons. It uses a hashtable to record the classes loaded during
the compilation process. CTopLevel
includes methods for finding
classes in the class path and for registering the signatures of
classes being compiled in the current session.
CClassType
and its subclasses represent the types of
objects. These representatives contain references to the same
CClass
singletons stored in CTopLevel
. The
lookup()
method of CClassType
provides a
mapping from type names to CClassNameType
singletons.
The CClassNameType
singletons are associated with their
CClass
singletons on-demand during compilation. This
association is triggered by the
CClassNameType.checkType()
method.
CTopLevel
includes storage and static methods to track
all external generic functions. The methods for manipulating and
querying this data are analogous to those for the CClass
singletons described above.