Warning: I have not yet selected a license to distribute this stuff. It's a work-in-progress, remember? I'll let you view this stuff from my web page, but I do not give permission to redistribute in its current form.
This layer provides a C file scanner and a Lisp file scanner. Both
produce a database mapping Lisp function and variable names to files,
and a mapping from files to a Lisp form, usually of the form
(featurep 'foo) or the form t, that describes
when that file is accessible.
The C scanner must filter out comments, keep track of
#ifdef … #endif forms, scan for
DEFUN and DEFVAR, and scan for invocations of
Fprovide.
The Lisp scanner must store variants for both variables and functions. For variables, the variants are: constant and variable. For functions, the variants are: macro, subst, and function.
This problem is currently too hard to solve for the C sources. When my C source analyzer is ready, we can plug it in to do the same job. In the meantime, though, we punt on C code.
This layer provides a Lisp scanner that, for each function, produces a list of functions called and a list of variables accessed. These lists are then matched up with the name→file mapping to produce a list of file dependencies. Note that a called macro induces a build time dependency on the file containing the macro definition, and a runtime dependency on any functions called or variables accessed by the macro expansion.
We must be able to filter out calls to Emacs functions that
are not made when the code is used on XEmacs. This means that we have
to do some constant propagation, and know that (featurep
'xemacs) is always true, that running-xemacs is a
constant with value true, and also be able to filter out the various
string-match calls that determine flavor.
Using the information produced in layer 2, we next construct a call graph. The edges are two-way, to support analysis of code that throws/signals. That is, each function has a list of functions that it calls, as well as a list of functions that call it.
Next, we walk the call graph from the “bottom” (functions that don't call other functions) to the “top” (functions that are not called by other functions). As we walk, we check types on arguments versus expected types. Unfortunately, recursive and mutually recursive functions mess up this scheme. For such functions, we take our best guess, then reanalyze after pushing the best guess through. This is a far sight less sophisticated than the Haskell approach, but there's only so much we can do here. I suppose we could keep going around recursive loops until nothing changes …
TODO: Deal with throw/signal/error/…
The atomic types are those types defined in C where the name of the
type describes the type entirely. The atomic types include the type
null, whose only member is the constant nil.
Note that the constant nil is not a member of any other
type. This makes function type descriptions more verbose than they
would be otherwise, but enables checks for misuses of nil.
The atomic types also include the all-inclusive type t,
which is the ultimate parent type of all other types.
The nonatomic types optionally supply additional information beyond the name of the type.
A bound on type T indicates an upper or lower bound on type T
that is a subtype of number. Each bound takes one of the
following forms:
*), indicating the absence of any
bound.An interval on type T indicates a permitted range on type T, with a lower bound and an upper bound. An interval is a dotted pair of bounds. For example:
(-20 . 5) indicates the fixnums -20, -19, …,
5.(-20 . (5)) indicates the fixnums -20, -19, …,
4.(-20 . *) indicates the fixnums -20, -19, …,
most-positive-fixnum(0.0 . 10.0) indicates all floating point numbers ≥
0.0 and ≤ 10.0.(0.0 . (10.0)) indicates all floating point numbers
≥ 0.0 and < 10.0.(* . *) indicates all elements of type T.A sequence length is one of the following:
*, meaning the length is unknown or unbounded.Sequence element types are defined in one of the following ways:
(types type1 …): indicates that the sequence
elements are to be matched up with successive types from the list of
types. Each type in the list is a type descriptor. The keyword
:repeat may appear at most once in this list; if it is
not present, it is deemed to appear immediately after the symbol
types. That keyword indicates that all following type
descriptors should be repeated, in order, as many times as necessary
to account for all members of a list.Type descriptors are similar, but not identical, to Common Lisp type descriptors. In particular, an Elisp type descriptor is one of the following.
A symbol naming a type is a complete type descriptor. If the symbol names a nonatomic type, then the type parameters are deemed to range over all possible values.
Sequence types can be restricted with the following syntax.
(bit-vector length): indicates a bit-vector with a
particular length. Note that the type descriptors
bit-vector, (bit-vector *), and
(bit-vector (* . *)) are equivalent.(string length): indicates a sequence of characters of
the indicated length.(list type length): indicates a list, where all
elements of the list have the type indicated by type and
the list has the indicated length.(vector type length): similar to
list.In the usual Lisp type taxonomy, the types cons and
list differ in only one respect: nil is a
list, but not a cons. Since we have already
split null out as a separate type, this distinction is not
useful. We instead use the cons type to describe dotted
lists only; i.e., when the cdr of the cons cell is not
another cons cell. In particular, the type descriptor (cons car
cdr) contains type descriptors car and
cdr, where cdr is not a subtype of either
cons or list.
The numeric types can include an interval to show that a limited range of that type is allowed. For example:
(fixnum (0 . 1)): indicates that the value must be 0 or
1.(integer (-4000000000 . 4000000000)): indicates an
integer that is at at least -4 billion, but no more than 4
billion.(float (0.0 . (1.0))): indicates nonnegative floating
point numbers less than 1.0.A function type includes a parameter list, which is a list of type
descriptors, possibly interspersed with the keywords
&optional and &rest. As is the case
with Elisp function declarations, if both &optional and
&keyword appear in a parameter list, then
&rest must appear last. Both keywords must be followed
immediately by at least one type descriptor, and &rest
must be followed by exactly one type descriptor.
Type names fun, subr, and macro
correspond to definitions created with defun,
defsubr, and defmacro, respectively. Each can
appear as a bare symbol, meaning that any function is possible, or can
appear in a list of the form (fun parameters return-type),
where parameters is a list of parameter types, with
optional keywords as described above, and return-type
describes the return type of the function or macro.
A type descriptor of the form (member obj1 …)
indicates the type consisting exactly of the listed objects. The
predicate eql is used to compare objects with elements of
this list.
A type descriptor can indicate that a symbol is expected simply by
specifying the type symbol. However, in some cases, the
requirement is a symbol bound to a particular value or function. For
those cases, we use the following syntax.
(symbol :value type): specifies the type of the value
to which the symbol is bound.(symbol :function type): specifies the type of the
function binding for the symbol.(symbol :value type :function type): specifies
both.Boolean expressions can be used to combine type descriptors in the following ways.
(and type1 …): indicates the intersection of the
following type descriptors.(or type1 …): indicates the union of the
following type descriptors.(not type): indicates that any element not of the
indicated type may appear.For example, the type descriptor (and t (not null))
matches any object except nil.
A type variable is used to indicate that the same type must appear in
multiple places. Type variables are indicated with Lisp keywords; i.e.,
symbols that start with a colon. For example, the type descriptor
(fun (:A) :A) indicates a function of one parameter whose
return type is the same as the type of its parameter.
Type variables may be restricted by requiring them to stand for some
limited set of types. This is accomplished by wrapping the type
descriptor with a restrict clause, which lists all
restrictions on type variables appearing in an enclosed type descriptor.
For example:
(restrict ((:A float)) (fun (:A fixnum) :A)): indicates
a function of two parameters, where the first parameter must be some
subtype of float (e.g., (float (0.0 . (1.0)))) and the
second parameter must be a fixnum. The return value of the function
has the same type as its first argument.(restrict ((:A fixnum) (:B vector)) (fun (:A :B) (cons :A
:B))): indicates a function of two parameters. The first
parameter can have any type that is a subtype of fixnum (e.g.,
(fixnum (0 . 1))) and the second parameter can be of any
vector type. The return type of the function is a dotted list whose
car has the type of the first parameter and whose cdr has the type of
the second parameter.FIXME: They aren't just type variables! They are variables that match the corresponding parts of type descriptors. They match entire type descriptors in some places, but can be exploded.
FIXME: Describe the special cases:
:num-args(+ ? ?)(and integer (not fixnum)) == bignum(and fixnum bignum) == integer(vector bit) == bit-vector(integer (most-negative-fixnum . most-positive-fixnum)) ==
fixnum(vector character) == string ??? FIXME: which one?(vector string-char) == string ??? FIXME: which
one?(sequence type *) == (sequence type)(sequence t) == sequenceDescribes when definitions appear together as a group. We need a
syntax to say that a given provide causes a bunch of
symbols to appear together, or the loading of a given file, or, for C
definitions, a given preprocessor symbol. We may need subgroups, which
add a constraint to a parent group, in order to specify conditionally
defined symbols.