Depths of Programming: Lexical scope

Emacs 24

Emacs 24 is going to get a significant feature: Emacs Lisp will natively support lexical scoping. This might haven't impressed you much, and to a certain degree you're right: it doesn't imply any visible UI or behavioral improvements, nor it even implies any changes the end-user can really appreciate directly. Nevertheless, the indirect consequences of introducing lexical scope support are hard to overestimate.

With lexical scope supported, a whole bunch of powerful abstractions become accessible. Thus programmers get still more powerful instruments for making complicated programs and implementing fantastic features in Emacs.

The 24th version of Emacs is not released yet, but Alpha versions (pretests) have been accessible at GNU FTP server (ftp://alpha.gnu.org/gnu/emacs/pretest) since 2011 autumn, and they have lexical scoping implemented. (At the time of writing this, the latest pretest is 24.0.96. I use 24.0.95.) So there's no reason for Elisp programmers to waste time: pick up lexical Elisp right now !

For those who have started to play with lexically-scoped Elisp or are determined to start, I devote this post.

Lexical scoping vs. dynamic scoping

First off, dynamic scoping is pervasive in Emacs and it is not possible to refuse from it completely (which would be anyway unreasonable). In fact, Emacs 24 supports both lexical and dynamic scoping. How do they coexist then ?

To put it short, there's a buffer-local variable named "lexical-binding" that is essentially a switch between lexical and dynamic scoping.

As for the interpreter, when it evaluates lambdas and the value of lexical-binding is t, it creates something which is called "a closure" by Elisp manual. What is meant here is just a list whose car is eq to the symbol "closure". The second element of the list serves the role of a captured lexical environment for that closure, in the form of alist (association list). This is followed by an ordinary lambda list and a body. Here's an example of a closure:

ELISP> (let ((x 12))
         #'(lambda (y)
             (* x y)))
(closure
((x . 12)
t)
(y)
(* x y))

To ensure that closures are nothing more than a list of a special shape, you can do the following:

ELISP> (fset 'func '(closure ((x1 . 1) (x2 . 1)) (x3)
                             (prog1
                                 x1
                               (setq x1 x2
                                     x2 x3))))
(closure
((x1 . 1)
(x2 . 1))
(x3)
(prog1 x1
   (setq x1 x2 x2 x3)))

ELISP> (func 10)
1
ELISP> (func 10)
1
ELISP> (func 12)
10
ELISP> (func 13)
10
ELISP> (func 22)
12
ELISP> (func 109)
13

(Above is something like a finite state transducer that returns its input sequence shifted back by 2 positions.)

As for the byte compiler, this is also the "lexical-binding" variable who chooses the way to go. When the byte compiler generates code, it looks at that value; if it is t, then it generates code in such a way that closures get created at runtime, in the form of usual byte-compiled function objects.

Look at the following function:

(defun make-multiplier (x)
(lambda (y)
    (* x y)))

If we try to compile this, here's what we get:

byte code for make-multiplier:
doc:   ...
args: 257
0    constant make-byte-code
1    constant 257
2    constant "\300 _\207"
3    constant vconcat
4    constant vector
5    stack-ref 5
6    call      1
7    constant []
8    call      2
9    constant 3
10    constant "\n\n(fn Y)"
11    call      5
12    return

So make-multiplier calls make-byte-code at runtime to create a fresh byte-compiled function object, and returns it. Each call to make-multiplier leads to a new distinct function being created and returned. Each of them will be closed over distinct location named "x" in the code above. This is exactly how closures and lexical scoping are supposed to work.

In the case when lexical-binding is nil, the old behavior is in effect: simple lambdas are created instead of closures (interpreter) & function objects get compiled at byte-compile time rather than arranged to be created dynamically at runtime (byte-compiler).

Defvar

When lexical scoping is on, the "defvar" special form can be used to make particular bindings be dynamic instead of lexical (which is default).

I haven't found a thorough explanation of how defvar now works with regards to lexical scoping, that's why I've made some research and will now try to share the results with you.

Value is present

First of all, it must be noted that a new symbol attribute has been introduced; now every symbol has a bit flag indicating whether that symbol names a special variable or not. You can read that flag with "special-variable-p" predicate. The latter's source in C is very simple, here it is:

DEFUN ("special-variable-p", Fspecial_variable_p, Sspecial_variable_p, 1, 1, 0,
       doc: /* ... */)
(Lisp_Object symbol)
{
   CHECK_SYMBOL (symbol);
   return XSYMBOL (symbol)->declared_special ? Qt : Qnil;
}

So it just checks a declared_special bit field.

Second, when you evaluate "(defvar <var> <val>)", the variable <var> receives the property of being special, that is, any symbol you put in place of <var> gets that declared_special bit set. For example:

ELISP> (defvar my-var 10)
my-var
ELISP> lexical-binding
t
ELISP> (let ((my-var 12))
         (lambda ()
           my-var))
(closure
(t)
nil my-var)

See that the closure hasn't captured my-var ? In fact, let just checks whether my-var has been declared special or not, by looking into its declared_special field. In this example, my-var is really a special variable, so let binds it the same way it did that in all versions of Emacs prior to 24.

As for byte compilation, the byte compiler behaves similarly: if it sees a "let" form binding a variable which has declared_special bit set, it generates the code which performs dynamic binding of that variable at runtime (with a byte code primitive named "varbind"). Otherwise the binding is lexical, and the code being generated just manipulates the stack storing values there, without any changes to the symbol-value slot of the variable being bound.

Value is not present

That's not all, though. When you evaluate "(defvar <var>)" without specifying any value, defvar doesn't set the declared_special flag of <var>.

When interpreting Elisp code, defvar just remembers that <var> is special by putting it onto the alist stored in the internal variable Vinterpreter_internal_environment. (This variable is not exposed to Elisp layer at all; it is just C variable.) This alist is of the same form and semantics as closures' captured lexical environments mentioned above: a cons cell element (VAR . VAL) means that VAR is a lexical variable in effect, having the value VAL. When a plain symbol (non-cons) is encountered in the alist, it means that this symbol is a dynamically-bound variable, and should be dynamically bound by all subsequent lets.

So, now you understand how the interpreter works regarding lexical environments: it just tracks (with Vinterpreter_internal_environment) the environment during the evaluation process by augmenting it within let blocks and in defvar forms. When the interpreter has to create a closure, it does so by giving it the current value of this Vinterpreter_internal_environment variable. That value is what is captured by the closure.

Let's see something concrete:

ELISP> (setq x
             (let ((w 10))
               (defvar w)
               #'(lambda (y)
                   (+ w y))))
(closure
(w
(w . 10)
t)
(y)
(+ w y))
ELISP> (funcall x 12)
22
ELISP> w
*** Eval error *** Symbol's value as variable is void: w

The closure has captured a lexical environment "(w (w . 10) t)". We'll examine it from end to beginning:

t -- for some reason, all closures contain t there. It says that the variable "t" is dynamic (not captured). This is by all evidence just an implementational technique chosen by people who worked on this. You can simply pay no attention at those t symbols captured.
(w . 10) -- this was pushed onto Vinterpreter_internal_environment by let.
w -- this was added by defvar. This entry does not shadow the previous one, and the resulting closure captures "w" with the value of 10.

Now see what happens if we move defvar outside of let:

ELISP> (progn
         (defvar w)
         (let ((w 10))
           #'(lambda (y)
               (+ w y))))
(closure
(w t)
(y)
(+ w y))
ELISP> (funcall (progn
                  (defvar w)
                  (let ((w 10))
                    #'(lambda (y)
                        (+ w y))))
                10)
*** Eval error *** Symbol's value as variable is void: w

In this case, the closure's captured environment doesn't have "(w . 10)" at all, since by the moment let is evaluated, the interpreter is already aware that "w" has been declared dynamic (by appearing inside Vinterpreter_internal_environment as a plain symbol).

Valueless defvar and byte compilation

The byte compiler also treats valueless defvars specially in some cases. But don't expect it to provide the same level of convenience as Common Lisp does. After all, that functionality has not yet been released, and Elisp is not a general-purpose language.

To catch the point, let's start with examples:

(defun make-multiplier ()
(defvar m1-x)
(let ((m1-x (buffer-size)))
    #'(lambda (m2)
        (* m1-x m2))))

This is compiled into:

byte code for make-multiplier:
doc:   ...
args: 0
0    constant buffer-size
1    call      0
2    varbind   m1-x
3    constant make-byte-code
4    constant 257
5    constant "\300 _\207"
6    constant vconcat
7    constant vector
8    varref    m1-x
9    call      1
10    constant []
11    call      2
12    constant 3
13    constant "\n\n(fn M2)"
14    call      5
15    unbind    1
16    return

So, when a variable has been declared special with defvar and then bound with let, what we end up with is a dynamic binding ("varbind m1-x"). This behavior is the same as in interpreting case.

Now try to evaluate (M-:) this:

(disassemble (make-multiplier))

You get something like:

byte code:

  doc:   ...

  args: 257

0    constant  349

1    stack-ref 1

2    mult      

3    return

This means that m1-x was captured by the closure resulting from make-multiplier, although it was bound dynamically with the enclosing let. This behavior is contrary to that in interpreting case. N'est-ce pas ?

Putting "(defvar m1-x)" at the beginning of lambda body doesn't change anything, nor helps inserting "(defvar m1-x)" at the beginning of let. In all cases m1-x is captured as a lexical variable.

The conclusion is this: when the byte compiler figures out whether to capture an enclosing variable, it considers only its declared_special attribute. Try to place a defvar with some value specified, and you will see that m1-x is not captured as lexical any more.

So how to use all that ? Conclusion

I would recommend everybody to use a couple of approaches in dealing with scoping stuff:

1. If you don't need lexical binding in a particular file (=library), just don't use it and that's all. Otherwise, see item 2.

2. If you do need lexical binding, then add a file-local variable "lexical-binding" (use "add-file-local-variable-prop-line" for that). You'll now have lexical scoping in effect by default.

3. If you want to create a (possibly internal within your library) global variable, feel free to do this at file level with defvar. Your variable will be dynamic throughout your Emacs system: it won't be captured by closures and let-bindings of it will act as dynamic ones. The same is true about defconst, by the way.

4. Previous 3 items cover most of typical Elisp programming cases and needs. If you want to be able to use something like:

(let ((y 10))
(declare (special y))
...)

as you would do in Common Lisp, and get equivalent behavior and sematics, then just forget about this and look for other ways around :) Keep in mind that defvar won't do the same job for Elisp as the "special" declaration currently does for CL; defvar is much less powerful.

Depths of Programming

Saturday, May 12, 2012

Lexical scope