Declaration & resolution
Consider a fairly simple imperative language, which consists of declarations and statements. Declarations declare and define items, like constants, functions and types, and statements change the runtime program state in some way. Declarations may refer to each other and themselves in any arbitrary order, while statements are executed sequentially and may only refer to items created “before” themselves.
fun mainfor iin 0 upto 100 when odd?(i)do println(i)end end fun odd?(x:Int ) => xmod 2 =1
Because of this setup, name resolution must probably be a two-pass thing, where the compiler first finds all the names of declared items and only then resolves the names.
For a batch compiler, that's straightforward enough, but for a query-based compiler, like you might find in a language server, things are a bit more complicated.
Ideally, we want a query-based pass to be a pure-ish, idempotent operation such that we can easily cache them and only rerun the parts that have changed. Additionally, we probably don't want to run the declaration pass over every single file simultaneously, in order to be able to cache at a more granular level.
This brings us to an important first clue. When we are just declaring the items in a given file, we do not need to know what is going on in other files .1 The big win here is that name declaration can happen file-by-file. This seems to imply that (part of) the declaration pass should look like this:
decls_in_file:List (Ast ) ->List (Name )
with the AST list containing all of the trees in a given file. This function can be a dead simple recursive pattern match, and is easily pure and idempotent.
Note also that this function also doesn't need to care about statements and the names declared within those. Because of their sequential and progressively scoped nature, names declared in a statement block are very naturally handled by the resolution pass.
This isn't the whole story, however. If our language was a simple
statements-within-declarations kind of language, then the list of names
is a perfectly fine thing for the decls_in_file
function to
return (since there's only the top-level declaration block). If we want
to support inline modules or declarative sections within a statement
block, however, then things will get a bit more hairy.
Consider:
fun mainlet x = hidden()do println(x)where fun hidden:Int let b, c =2 ,3 then return dwhere let d = a + b + cend end let a =4 end
Here, we have a declaration (of main
), which contains some
statements (the let
and do
), but also another
declarative section (introduced by the where
keyword).
Inside that section, two more items are declared (hidden
and a
). The function itself contains some statements, one
of which is a block statement that contains a declarative section, and
that declarative section is allowed to reference items created by
statements preceeding this one.
This example is needlessly convoluted, but the ability to declare function local types and items can be quite a useful method of encapsulation, as something slightly more lightweight than a private module, for example.
Regardless, the big takeaway is that even within a file, the declared items should form something closer to a tree than a list. For the example above, we may want to end up with something like this:
(file )└─ main├─ a├─ hidden│ └─ (statements )│ ├─ b│ ├─ c│ └─ (then )│ ├─ d│ └─ (statements )└─ (statements )└─ x
Resolving
With that information, we should be able to resolve names. This pass may no longer be file local, since we might be importing items from other modules in other files. However, that is fine – we can just issue a query that runs the declaration pass on the appropirate file(s) whenever we hit an import. That way, we can ensure that the resolver pass doesn't need any more non-file-local information than absolutely necessary, which keeps the caching granular and nice.
But how exactly do we approach the resolution? When we are recursing through the syntax trees, how do we keep track of where in the declaration tree we are? This gets especially tricky with declaration blocks, like in the example, where the actual declarative section is without a name and within some statement.
We can employ another key observation here: we very rarely care about
looking inside a scope, but we very often do care about looking
outside one. Pretty much only in the case of modules do we even
care about going inside of a scope to find more names, and it's pretty
much always explicit when we're doing that (because of imports or
Module.name
syntax).
This seems to suggest that the important information is not which scopes a scope contains, but which scope is the parent of some other scope.
We still need some way of figuring out which scope we are in while traversing a syntax tree, but that can be achieved by assosciating each scope with a unique id and each scope-opening construct (block, declaration, etc) with the id of its scope.
file [0 ]:fun [1 ] main:decls :fun [2 ] hidden:stmts [3 ]:let [4 ] b =2 let [5 ] c =3 then [6 ]:decls :let [7 ] d = a + b + cstmts :return dlet [8 ] a =4 stmts [9 ]:let [10 ] x = hidden()do println(x)
0 : (file ) - main1 : main [in0 ] - a - hidden2 : hidden [in1 ]3 : (statements ) [in2 ] - b - c4 : b [in3 ]5 : c [in3 ]6 : (then ) [in3 ] - d7 : d [in6 ]8 : a [in1 ]9 : (statements ) [in1 ] - x10 : x [in9 ]
Note that each declaring construct (fun
, let
,
etc.) has been assosciated with the id of the scope it introduces. Each
scope also contains a list of the names declared in that scope in the
form it appears in the source. Ultimately, we end up with an AST type
that looks something like this:
enum Decl {Fun (ScopeId ,Name ,Vec <Decl >,Stmts ),Let (ScopeId ,Name ,Expr ),// ... }struct Stmts { scope:ScopeId , stmts:Vec <Stmt >, }enum Stmt {Do (Expr ),Let (ScopeId ,Name ,Expr ),Then (ScopeId ,Vec <Decl >,Stmts ),// ... }// ...
And a file scope type that looks like this:
enum ScopeName {File ,StatementList ,Block ,Name (Name ), }struct Scope { name:ScopeName , parent:Option <ScopeId >, children:Vec <(String ,Name )>, }struct Scopes { scopes:HashMap <ScopeId ,Scope >, }
Resolving a name just involves keeping track of the nearest scope id,
and consulting the appropriate Scopes
.
impl Scopes {fn find_name(&self , at:ScopeId , name: &str ) ->Option <Name > {let scope =self .scopes.get(id).unwrap();// search this scope for (other, res)in scope.children.iter() {if other == name {return Some (res); } }// if not found, search the parent scope scope.parent .map(|at|self .find_name(at, name) } }
There's some extra care needed to handle statements and statement scopes properly, but that's the general idea.