Depths of Programming: Let the hacking commence

In this post, I'd like to sum up all the changes and fixes I've made so far to the CL package of Elisp, as well as introduce separate tools (functions and macros) that can be useful in lexical Elisp. You can find all of them by running:

svn export http://subversion.assembla.com/svn/prog-elisp/cl-changes
svn export http://subversion.assembla.com/svn/prog-elisp/trunk

They seem to work fine in Emacs 24.0.95 at my laptop :) So, let's start.

1. Brief overview

1.1. CL package vs true Common Lisp

The CL package (which was created before the introduction of lexical scoping) doesn't provide full Common Lisp-compatibility for Elisp. It should be regarded as a set of useful Common Lisp-like facilities to make Elisp programming be higher-level.

When I was making my way through the CL package, it was a great disappointment for me to discover the package to be inconsistent with canonical Common Lisp.

And I sent an email to the Emacs-Devel mailing list containing all the (negative) CL experience I had had by that time. Since then, I managed to find acceptable workarounds or fix almost all of things I didn't like. They all will be described below.

To put the main idea short: don't expect Elisp and CL package to be as good as
Common Lisp and its standard library. These 2 languages serve very different purposes, and belong to different weight categories.

1.2. Influence of lexical scoping on the CL package

Another sort of things I'll mention here is changes that are caused by the introduction of lexical scoping in Emacs 24. Giving that, a bunch of things CL package offers become either obsolete or unneeded at all. I'm talking mainly about lexical-let and related stuff that emulates lexical scoping in dynamically-scoped Elisp. Since now Elisp natively supports lexical scoping, all these things are now obsolete.. they degrade performance and (which is much worse) programmer's efficiency.

2. CL package changes

Before we start, I'd just want to say that I tried to introduce as little changes as possible. Issues of having to merge all these changes every time a new version of Emacs is released and the problem of cross-system compatibility both do matter. Not to break anything is still more important thing to take care of. So I tried to minimize changes in existing code in favor of introducing new functions, where possible. In those places where changes were inevitable, I did my best to make them as seamless and harmless as possible.

2.1. macroexpand-1

Elisp doesn't have macroexpand-1 facility, contrary to Common Lisp. By the way, Common Lisp lacks macroexpand-all, whereas Elisp does have it.

macroexpand-1 turns out to be really necessary, as you'll see later on. So I decided to implement it. macroexpand is a C subroutine, so there's no other
option as to implement macroexpand-1 in C too. Fortunately, this was not very difficult, since macroexpand is implemented by just performing single macroexpansions in a while loop. So I just cut the loop's body and pasted it
into macroexpand-1's definition (this was not as simple as it sounds, though,
but it is essentially what was done), and then I made macroexpand spin in a
loop calling the new routine (macroexpand-1) and performing sequential
macroexpansions, until what we get is not a macro call. Pretty easy.

2.2. Places

Places are not handled gracefully by the CL package either. Yes, it is very good that they've been introduced to Elisp. But they are not quite the same as in true Common Lisp. You can read about some problems with them in that same
mail by the link above.

Fortunately, it was easy to fix the main inconsistency. get-setf-method has been
changed to call cl-macroexpand-1 in cases it cannot find a setf-expansion for the form. Previously it used cl-macroexpand for that, expanding all the intermediate macro forms and not examining whether any of them is a setf-able place.

2.3. Symbol macros

Common Lisp has a notion of a symbol macro and a special form named
"symbol-macrolet". In the CL package, they tried to implement such a thing in Elisp as well. They should better have not done that..

There's a variable named "cl-macro-environment" which tracks, as its name says, the current macro environment (both when evaluating and when byte compiling code). Normally it contains conses mapping symbols (macro names) to macro functions. Authors of the CL package decided to represent symbol macros as cons cells which map strings (symbol-name of macro symbols) to their expansions. Now comes the most disgusting thing about this: cl-macro-environment is scanned with assq, meaning that strings are compared with eq. This makes a user care about what actual string is stored in a symbol's symbol-name slot, which is very bad.

It is quite possible that 2 distinct symbols share the same string as their
symbol-name, and they both get expanded with the same symbol macro definition, while only one of them is really a symbol macro. I'll cite one of examples you can find here:

(defmacro test-macro ()
(let* ((symbol-1 'sym)
         (symbol-2 (make-symbol (symbol-name symbol-1))))
    `(let ((v1 0)
           (v2 0))
       (symbol-macrolet ((,symbol-1 (incf v1))
                         (,symbol-2 (incf v2)))
         ,symbol-1
         ,symbol-2
         (list v1 v2)))))

This code prints (0 2), when it should print (1 1): that's because the same string object serves the purpose of being the symbol name for 2 distinct symbols.

So I advise to all Elisp programmers: forget about symbol macros. They're not
handled properly, and Elisp has been unaware of symbol macros for a very long
time before. (Honestly, the language doesn't need them.) Moreover, there are
many places in the CL package itself where symbol macros are handled incorrectly (macros setf, psetf, shiftf, rotatef -- they all just check their arguments with the simple symbolp predicate, not even trying to find out whether it is a symbol macro).

2.4. cl-macroexpand-all

This function's consistency is also dubious. First, in some cases it may return
non-fully expanded expression. Look at this:

ELISP> (cl-macroexpand-all '(let (((aref x 1) 10)) x))
(letf
    (((aref x 1)
      10))
x)

Yes, I agree that this ability to treat let like letf (and setq like setf, by the way) was probably devised to be used only internally by the CL package. Nevertheless, I'm sure this behavior is erroneous, anyway.

Now look at the second example:

ELISP> (labels ((double (x) (list x x)))
         (list 'double (double 10)))
((lambda
   (x)
   (list x x))
(10 10))

ELISP> (labels ((double (x) (list x x)))
         (list #'double (double 10)))
((lambda
   (x)
   (list x x))
(10 10))

Do you see the problem ? Under "labels" all the quoted names of a function
defined are treated as the references to that function. So there's no way to
just write a quoted symbol 'double. Also bad.

In order to fix cl-macroexpand-all and not to break anything, I created a separate function named ux-mexp-all (see ux.el). This is just fixed
cl-macroexpand-all. It doesn't return anything non-fully expanded, and it correctly deals with quoted function names. See this:

ELISP> (ux-labels ((double (x) (list x x)))
         (list 'double (double 10) #'double))
(double
(10 10)
(lambda
   (x)
   (list x x)))

ux-labels is labels adjusted for lexical scoping. See the next section below.

3. flet and others in the lexical world

First, the lexical-let facility is deprecated now. Likewise, all the macros that expand into lexical-let get also deprecated. At this point, it seems to me there's only 1 such macro: flet. Second, all macros that use cl-macroexpand-all
to expand their bodies have also problems (just because of cl-macroexpand-all). They are: labels and macrolet. I've created their counterparts: ux-labels and ux-mlet. These use ux-mexp-all instead of cl-macroexpand-all.

As for lexical-let, the piece of advice is quite simple: just don't use it in lexical-scoping code at all.

4. A small fix to pcase.el

pcase is an Elisp library implementing ML-style pattern matching. I was very
glad to see such a library appeared in Emacs. Elisp programming gets more
high-level.

The only problem I found is that:

(defun pcase--small-branch-p (code)
(and (= 1 (length code))
       (or (not (consp (car code)))
           (let ((small t))
             (dolist (e (car code))
               (if (consp e) (setq small nil)))
             small))))

It is very naive to check whether a piece of code is small by analyzing it directly. It may contain any macro calls, and a single cons "(x)" may expand
into a huge piece of code. So I changed that function to be this:

(defvar cl-macro-environment)
(declare-function ux-mexp-body "" (m e) t)

(defun pcase--small-branch-p (code)
(setq code (ux-mexp-body code cl-macro-environment))
(and (= 1 (length code))
       (or (not (consp (car code)))
           (let ((small t))
             (dolist (e (car code))
               (if (consp e) (setq small nil)))
             small))))

The new function just performs a macroexpansion before checking anything. ux-mexp-body just mapcars over the body with ux-mexp-all, that is, expands each form with ux-mexp-all, and assembles the resulting expansions in a list (new body).

Depths of Programming

Sunday, May 13, 2012

Let the hacking commence