Saturday, December 15, 2012

Aligning assignments to the same column




I've recently written 2 Emacs interactive commands for editing programs. The first command is a function that aligns a series of assignment operators written on separate consecutive lines, so that the equal signs ("=") are placed on the same column, which looks quite pretty and is sometimes preferred in certain coding conventions:


Unaligned
Aligned



The second command is useful when you're programming in C and use multiline macros (#defines) heavily.  It appends a backslash to the end of each line that constitutes the macro definition, and those backslashes are properly aligned to appear on the same column.  When you edit your macro definition, the slashes cease to be aligned; it is then sufficient to call this command again, and it will realign slashes as you want:



Aligning assignments

Here's the code that aligns assignments:

(defun eld-align-assignments ()

  (interactive)
  (save-excursion
    (let ((markers ())           ;retain all "=" positions here.
          (col-offsets ())       ;corresponding column offsets
          (longest-offset 0))
      (unwind-protect
          (flet
              ((examine-current-line ()
                 ;; Return the position of "=" if the current
                 ;; line looks like an assignment.  Otherwise,
                 ;; return nil.  Point is not moved.
                 (assert (bolp))
                 (when (looking-at "\\(?:\\s-\\|\\sw\\|\\s_\\|\\s.\\)*\\(=\\)")
                   (push (copy-marker (match-beginning 1)) markers)
                   (let ((col-offset (- (match-beginning 1) (point))))
                     (callf max longest-offset col-offset)
                     (push col-offset col-offsets)
                     t))))
            (forward-line 0)
            (examine-current-line)
            ;; if the first line did not match, it doesn't tell us anything: a
            ;; user can launch the command from the following line.
            (loop 
             do
             (when (= (forward-line -1) -1) (return)) ;reached (point-min)
             (unless (examine-current-line) (return)))
            ;; okay, now we have longest-offset and know how many spaces to put
            ;; at each marker
            (loop
             for marker in markers
             for col-offset in col-offsets
             do
             (goto-char marker)
             (insert-char ?\s (- longest-offset col-offset))))
        (dolist (m markers)
          (set-marker m nil))))))


The logic is quite simple-minded: examine lines to the top of the current
one trying to distinguish those which look like assignments. In this
respect, the algorithm is not sophisticated, as you can see: we're just
matching a regular expression from the beginning of the line, and that's
it.  This should suffice for simple cases; however, assignments to arrays
with subscripts that contain "=" themselves may not work properly, for example.
This is not very common, though, as in most situations assignments are done to
simple variables.

Once a line has been recognized as an assignment, we're placing a mark
before the equal sign; we also remember the longest column where an equal
sign appears.  Having determined the longest-column, the only thing left to
do is to pad each = with spaces so that all they move to the same column --
the longest-column.

Note: In this snippet I used flet macro which I was recommended against in one of
my previous posts.  I did that because I want people to be able to just copy-paste
that command and have it work for them.  For those who want to take the long way,
consider throwing flet out.


Aligning slashes

(defun eld-realign-c-macro/region (start end) "Append backslash to each line that ends within the given region. Backslahes are aligned to the same column as is usually done in C. If a line already ends in a backslash, this is handled gracefully." (interactive "r") (setq end (copy-marker end t)) (unwind-protect (save-excursion (goto-char start) (let ((max-column 0)) (loop (end-of-line 1) (unless (< (point) end) (return)) ;; check whether preceding char is backslash (when (= (preceding-char) ?\\) (delete-char -1)) (delete-horizontal-space t) ;delete backward whitespace only (setq max-column (max max-column (current-column))) (forward-char) ;move to the next line, 1 char is ok ) ;; now we've determined max-column. proceed with aligning slashes (goto-char start) (loop (end-of-line 1) (unless (< (point) end) (return)) (move-to-column max-column t) (insert-char ?\\ 1) (forward-char)))) (set-marker end nil))) (defun eld-realign-c-macro () "Like `eld-make-c-macro/region' but determines the region itself. The function does so by calling `c-beginning-of-macro' and `c-end-of-macro'." (interactive) (eld-realign-c-macro/region (save-excursion (c-beginning-of-macro) (point)) (save-excursion (c-end-of-macro) (point)))) 

The algorithm resembles the previous one: examine each line (this time the
end of it) and remember the column number where a backslash should be
placed.  Then find maximum of these numbers and place backslashes in each
line there.

To determine the beginning and end of a macro definition, functions
c-beginning-of-macro and c-end-of-macro are used.  They are the standard
ones from the C mode.

There's quite an interesting idiom in the function eld-realign-c-macro
above: save excursion, then execute code that moves point at the desired
location, and return that location by calling (point).  This could be
highlighted as a special macro:

(defmacro point-would-be (&rest body)
  `(save-excursion
     (progn ,@body)
     (point)))

I hope someone will find these things useful.  Thanks for attention !


Wednesday, October 3, 2012

CL "block" and code walking

This article is kind of a preparation to a series of more complicated and elaborate ones. Here we'll talk about a technical issue with the block/return-from macros from the CL package. This nuance can reveal itself when you do the code walking. So, let's start.

Block and return-from

The CL package contains a pair of Common Lisp-like macros named block and return-from. These are supposed to emulate lexical non-local exits, as in true Common Lisp.



This trick works because block expands into catch, and return-from expands into throw. See this:



(ux-mexp-all is an improved version of cl-macroexpand-all; see my previous post named "Let the hacking commence").


As you can see, block has been expanded into a catch establishing a special tag (named --cl-block-outer--, which is interned symbol, by the way), and return-from has become a throw function form throwing the specified return value from the corresponding catch.

So, lexically-scoped non-local exits are really absent in Elisp; Elisp has only dynamic non-local exits: they are catch and throw. Lexical things are emulated by choosing specific names for the catch tag, so that no other throw call can inadvertently bring us to the catch that a block was expanded into.

Optimization (catch elimination)

The CL package provides 1 (pretty trivial) optimization for block/return-from tandem. This is the elimination of enclosing catch in the case when no return-from matching the block are encountered within the scope of this block. We'll call the catch elimination.

For example, in the sample below block is completely unnecessary:


Let's see what it is expanded into:


So why has the catch not been eliminated as I just said ? And what is cl-block-wrapper ?

Look attentively, can you see any differences between the example of macroexpansion above and this one?

Yes, you're right: here we haven't bound cl-macroexpand-cmacs to t around the call to ux-mexp-all. Now let us try to bind it and repeat the experiment:


Ha ? It turns out that the optimization takes place only when cl-macroexpand-cmacs is t, that is, during the compiler-macro-expansion.

cl-block-wrapper and cl-block-throw

Let us expand the piece of code to the full extent but not perform compiler-macro-expansion. Here's what we have:


This resembles one of the previous samples. block is actually expanded into cl-block-wrapper, and return-from -- into cl-block-throw. Here are the definitions of the two:


block first checks whether its body is safe. If it is, then no return-froms are guaranteed to be present within the body. This trick is also a kind of optimization, though not important in our discussion. (I think they added such a trick for block to perform better under interpreter).

Otherwise, block is expanded into a call to cl-block-wrapper which wraps a catch form containing the body passed to the block initially. return-from is expanded into a call to cl-block-throw with the same quoted tag symbol.


Compiler macros for cl-block-wrapper and cl-block-throw are not so simple as their function definitions:


cl-active-block-names is an alist mapping catch tags (which directly correspond to block names) to boolean flags (t/nil). These flags show us whether the optimization is possible: initially all the flags are set to nil, and macroexpand-all is invoked. The latter function performs compiler-macro-expansions, and the compiler macro for cl-block-throw sets corresponding flags to t, thus showing us that a return-from targeting the respective block has been encountered and the catch elimination is impossible. So after the return from macroexpand-all, we know whether we can eliminate the enclosing catch or not.

What is wrong about all this

There's one little (potential) problem with this approach: it relies on an implicit assumption that cl-block-wrapper wraps a catch form that corresponds to a block. Everything is okay as long as you don't do advanced Emacs Lisp programming (such as code walking, for instance). But when you try to make certain manipulations on code, your meta-programming facilities treat cl-block-wrapper as a usual function (which it is) and don't know that they cannot lexically detach it from its wrapped catch form.


The above 2 pieces of code must be equivalent, but they are not. This code will fail when you try to compile it, because of how cl-block-wrapper deals with its argument. See the definition of the compiler macro for cl-block-wrapper above.

Amendments for CL

Make it correct

So, the first and the most basic correction will be to make the compiler macro for cl-block-wrapper check its argument explicitly. When it doesn't happen to look like a catch form, leave it alone; otherwise, do the usual optimizations.


Make it perform better

By this change we've made the optimization technique correct, because it is not done when the argument of cl-block-wrapper doesn't appear to be a catch form. But even in this case the optimization can still be done.

The fundamental flaw with this compiler-macro-optimization thing is that source-level optimizations should better take place at macroexpansion time, not at compile-macro-expansion time. Compiler macros are supplementary by their nature; they are advisory, optional. Such an optimization should better have been placed in macro definition.

So we're going to do exactly this -- move the optimization logic into the definition of block.


ux-alternative-block-expander is an alternative expander function for the block macro. It also uses cl-active-block-names alist for remembering return-froms which were encountered. But instead of calling macroexpand-all, we now rely upon our home-made and corrected ux-mexp-all expansion function (ux-mexp-implicit-progn is just a helper which calls ux-mexp-all).

ux-alternative-return-from-expander look also much like the CL compiler macro for return-from. cl-active-block-names is scanned looking for block-name; if it is found, we set the corresponding flag and expand into a usual throw. Otherwise, an error is signaled (return-from has not sense out of block).

Finally, ux-establish-alternative-block is a function that takes environment env and returns that same environment possibly augmented by alternative expanders for block and return-from. If alternative definitions are already established in env, we just return env unchanged; otherwise, 2 new conses are pushed onto env and returned.

How to use ux-establish-alternative-block.

When you need to expand a form to the full depth and then to make some further manipulations on the result, you most probably need to augment the environment you've captured with the alternative definitions for block and return-from. The augmented environment should be used in a subsequent call to ux-mexp-all. This will ensure all catch eliminations will take place during the macroexpansion, and what you get will be the final result.

That's all I wanted to tell about here. I perfectly realize that this post rather describes yet another kludge for Emacs Lisp and CL package; it doesn't look like something new, useful and valuable. Nevertheless, these alternative macro definitions of block and return-form will help us greatly in higher-level stuff which I hope to publish later.

Thank you for attention.

Sunday, May 13, 2012

Let the hacking commence

In this post, I'd like to sum up all the changes and fixes I've made so far to the CL package of Elisp, as well as introduce separate tools (functions and macros) that can be useful in lexical Elisp.  You can find all of them by running:

svn export http://subversion.assembla.com/svn/prog-elisp/cl-changes
svn export http://subversion.assembla.com/svn/prog-elisp/trunk


They seem to work fine in Emacs 24.0.95 at my laptop :) So, let's start.

1. Brief overview

 

1.1. CL package vs true Common Lisp

The CL package (which was created before the introduction of lexical scoping) doesn't provide full Common Lisp-compatibility for Elisp.  It should be regarded as a set of useful Common Lisp-like facilities to make Elisp programming be higher-level.

When I was making my way through the CL package, it was a great disappointment for me to discover the package to be inconsistent with canonical Common Lisp.

And I sent an email to the Emacs-Devel mailing list containing all the (negative) CL experience I had had by that time.  Since then, I managed to find acceptable workarounds or fix almost all of things I didn't like.  They all will be described below.

To put the main idea short: don't expect Elisp and CL package to be as good as
Common Lisp and its standard library.  These 2 languages serve very different purposes, and belong to different weight categories.

 

1.2. Influence of lexical scoping on the CL package

Another sort of things I'll mention here is changes that are caused by the introduction of lexical scoping in Emacs 24.  Giving that, a bunch of things CL package offers become either obsolete or unneeded at all.  I'm talking mainly about lexical-let and related stuff that emulates lexical scoping in dynamically-scoped Elisp.  Since now Elisp natively supports lexical scoping, all these things are now obsolete.. they degrade performance and (which is much worse) programmer's efficiency.

 

2. CL package changes

Before we start, I'd just want to say that I tried to introduce as little changes as possible.  Issues of having to merge all these changes every time a new version of Emacs is released and the problem of cross-system compatibility both do matter.  Not to break anything is still more important thing to take care of.  So I tried to minimize changes in existing code in favor of introducing new functions, where possible.  In those places where changes were inevitable, I did my best to make them as seamless and harmless as possible.

 

2.1. macroexpand-1

Elisp doesn't have macroexpand-1 facility, contrary to Common Lisp.  By the way, Common Lisp lacks macroexpand-all, whereas Elisp does have it.

macroexpand-1 turns out to be really necessary, as you'll see later on.  So I decided to implement it.  macroexpand is a C subroutine, so there's no other
option as to implement macroexpand-1 in C too.  Fortunately, this was not very difficult, since macroexpand is implemented by just performing single macroexpansions in a while loop.  So I just cut the loop's body and pasted it
into macroexpand-1's definition (this was not as simple as it sounds, though,
but it is essentially what was done), and then I made macroexpand spin in a
loop calling the new routine (macroexpand-1) and performing sequential
macroexpansions, until what we get is not a macro call.  Pretty easy.

2.2. Places

Places are not handled gracefully by the CL package either.  Yes, it is very good that they've been introduced to Elisp.  But they are not quite the same as in true Common Lisp.  You can read about some problems with them in that same
mail by the link above.

Fortunately, it was easy to fix the main inconsistency. get-setf-method has been
changed to call cl-macroexpand-1 in cases it cannot find a setf-expansion for the form.  Previously it used cl-macroexpand for that, expanding all the intermediate macro forms and not examining whether any of them is a setf-able place.

2.3. Symbol macros

Common Lisp has a notion of a symbol macro and a special form named
"symbol-macrolet".  In the CL package, they tried to implement such a thing in Elisp as well.  They should better have not done that..

There's a variable named "cl-macro-environment" which tracks, as its name says, the current macro environment (both when evaluating and when byte compiling code).  Normally it contains conses mapping symbols (macro names) to macro functions.  Authors of the CL package decided to represent symbol macros as cons cells which map strings (symbol-name of macro symbols) to their expansions.  Now comes the most disgusting thing about this: cl-macro-environment is scanned with assq, meaning that strings are compared with eq.  This makes a user care about what actual string is stored in a symbol's symbol-name slot, which is very bad.
It is quite possible that 2 distinct symbols share the same string as their
symbol-name, and they both get expanded with the same symbol macro definition, while only one of them is really a symbol macro.  I'll cite one of examples you can find here:

(defmacro test-macro ()
  (let* ((symbol-1 'sym)
         (symbol-2 (make-symbol (symbol-name symbol-1))))
    `(let ((v1 0)
           (v2 0))
       (symbol-macrolet ((,symbol-1 (incf v1))
                         (,symbol-2 (incf v2)))
         ,symbol-1
         ,symbol-2
         (list v1 v2)))))

This code prints (0 2), when it should print (1 1): that's because the same string object serves the purpose of being the symbol name for 2 distinct symbols.

So I advise to all Elisp programmers: forget about symbol macros.  They're not
handled properly, and Elisp has been unaware of symbol macros for a very long
time before.  (Honestly, the language doesn't need them.)  Moreover, there are
many places in the CL package itself where symbol macros are handled incorrectly (macros setf, psetf, shiftf, rotatef -- they all just check their arguments with the simple symbolp predicate, not even trying to find out whether it is a symbol macro).

 

2.4. cl-macroexpand-all

This function's consistency is also dubious.  First, in some cases it may return
non-fully expanded expression.  Look at this:

ELISP> (cl-macroexpand-all '(let (((aref x 1) 10)) x))
(letf
    (((aref x 1)
      10))
  x)

Yes, I agree that this ability to treat let like letf (and setq like setf, by the way) was probably devised to be used only internally by the CL package. Nevertheless, I'm sure this behavior is erroneous, anyway.

Now look at the second example:

ELISP> (labels ((double (x) (list x x)))
         (list 'double (double 10)))
((lambda
   (x)
   (list x x))
 (10 10))

ELISP> (labels ((double (x) (list x x)))
         (list #'double (double 10)))
((lambda
   (x)
   (list x x))
 (10 10))

Do you see the problem ?  Under "labels" all the quoted names of a function
defined are treated as the references to that function.  So there's no way to
just write a quoted symbol 'double.  Also bad.

In order to fix cl-macroexpand-all and not to break anything, I created a separate function named ux-mexp-all (see ux.el).  This is just fixed
cl-macroexpand-all.  It doesn't return anything non-fully expanded, and it correctly deals with quoted function names.  See this:

ELISP> (ux-labels ((double (x) (list x x)))
         (list 'double (double 10) #'double))
(double
 (10 10)
 (lambda
   (x)
   (list x x)))

ux-labels is labels adjusted for lexical scoping.  See the next section below.

3. flet and others in the lexical world

First, the lexical-let facility is deprecated now.  Likewise, all the macros that expand into lexical-let get also deprecated.  At this point, it seems to me there's only 1 such macro: flet.  Second, all macros that use cl-macroexpand-all
to expand their bodies have also problems (just because of cl-macroexpand-all).  They are: labels and macrolet.  I've created their counterparts: ux-labels and ux-mlet.  These use ux-mexp-all instead of cl-macroexpand-all.

As for lexical-let, the piece of advice is quite simple: just don't use it in lexical-scoping code at all.

4. A small fix to pcase.el

pcase is an Elisp library implementing ML-style pattern matching.  I was very
glad to see such a library appeared in Emacs.  Elisp programming gets more
high-level.

The only problem I found is that:

(defun pcase--small-branch-p (code)
  (and (= 1 (length code))
       (or (not (consp (car code)))
           (let ((small t))
             (dolist (e (car code))
               (if (consp e) (setq small nil)))
             small))))

It is very naive to check whether a piece of code is small by analyzing it directly.  It may contain any macro calls, and a single cons "(x)" may expand
into a huge piece of code.  So I changed that function to be this:

(defvar cl-macro-environment)
(declare-function ux-mexp-body "" (m e) t)

(defun pcase--small-branch-p (code)
  (setq code (ux-mexp-body code cl-macro-environment))
  (and (= 1 (length code))
       (or (not (consp (car code)))
           (let ((small t))
             (dolist (e (car code))
               (if (consp e) (setq small nil)))
             small))))

The new function just performs a macroexpansion before checking anything. ux-mexp-body just mapcars over the body with ux-mexp-all, that is, expands each form with ux-mexp-all, and assembles the resulting expansions in a list (new body).

Saturday, May 12, 2012

Lexical scope

Emacs 24

Emacs 24 is going to get a significant feature: Emacs Lisp will natively support lexical scoping.  This might haven't impressed you much, and to a certain degree you're right: it doesn't imply any visible UI or behavioral improvements, nor it even implies any changes the end-user can really appreciate directly.  Nevertheless, the indirect consequences of introducing lexical scope support are hard to overestimate.

With lexical scope supported, a whole bunch of powerful abstractions become accessible.  Thus programmers get still more powerful instruments for making complicated programs and implementing fantastic features in Emacs.

The 24th version of Emacs is not released yet, but Alpha versions (pretests) have been accessible at GNU FTP server (ftp://alpha.gnu.org/gnu/emacs/pretest) since 2011 autumn, and they have lexical scoping implemented.  (At the time of writing this, the latest pretest is 24.0.96.  I use 24.0.95.)  So there's no reason for Elisp programmers to waste time: pick up lexical Elisp right now !

For those who have started to play with lexically-scoped Elisp or are determined to start, I devote this post.

Lexical scoping vs. dynamic scoping

First off, dynamic scoping is pervasive in Emacs and it is not possible to refuse from it completely (which would be anyway unreasonable).  In fact, Emacs 24 supports both lexical and dynamic scoping. How do they coexist then ?

To put it short, there's a buffer-local variable named "lexical-binding" that is essentially a switch between lexical and dynamic scoping.

As for the interpreter, when it evaluates lambdas and the value of lexical-binding is t, it creates something which is called "a closure" by Elisp manual.  What is meant here is just a list whose car is eq to the symbol "closure".  The second element of the list serves the role of a captured lexical environment for that closure, in the form of alist (association list).  This is followed by an ordinary lambda list and a body.  Here's an example of a closure:

ELISP> (let ((x 12))
         #'(lambda (y)
             (* x y)))
(closure
 ((x . 12)
  t)
 (y)
 (* x y))

To ensure that closures are nothing more than a list of a special shape, you can do the following:

ELISP> (fset 'func '(closure ((x1 . 1) (x2 . 1)) (x3)
                             (prog1
                                 x1
                               (setq x1 x2
                                     x2 x3))))
(closure
 ((x1 . 1)
  (x2 . 1))
 (x3)
 (prog1 x1
   (setq x1 x2 x2 x3)))



ELISP> (func 10)
1
ELISP> (func 10)
1
ELISP> (func 12)
10
ELISP> (func 13)
10
ELISP> (func 22)
12
ELISP> (func 109)
13

(Above is something like a finite state transducer that returns its input sequence shifted back by 2 positions.)

As for the byte compiler, this is also the "lexical-binding" variable who chooses the way to go.  When the byte compiler generates code, it looks at that value; if it is t, then it generates code in such a way that closures get created at runtime, in the form of usual byte-compiled function objects.

Look at the following function:

(defun make-multiplier (x)
  (lambda (y)
    (* x y)))

If we try to compile this, here's what we get:

byte code for make-multiplier:
  doc:   ...
  args: 257
0    constant  make-byte-code
1    constant  257
2    constant  "\300 _\207"
3    constant  vconcat
4    constant  vector
5    stack-ref 5
6    call      1
7    constant  []
8    call      2
9    constant  3
10    constant  "\n\n(fn Y)"
11    call      5
12    return     

So make-multiplier calls make-byte-code at runtime to create a fresh byte-compiled function object, and returns it.  Each call to make-multiplier leads to a new distinct function being created and returned.  Each of them will be closed over distinct location named "x" in the code above.  This is exactly how closures and lexical scoping are supposed to work.

In the case when lexical-binding is nil, the old behavior is in effect: simple lambdas are created instead of closures (interpreter) & function objects get compiled at byte-compile time rather than arranged to be created dynamically at runtime (byte-compiler).

Defvar

When lexical scoping is on, the "defvar" special form can be used to make particular bindings be dynamic instead of lexical (which is default).

I haven't found a thorough explanation of how defvar now works with regards to lexical scoping, that's why I've made some research and will now try to share the results with you.

Value is present

First of all, it must be noted that a new symbol attribute has been introduced; now every symbol has a bit flag indicating whether that symbol names a special variable or not.  You can read that flag with "special-variable-p" predicate.  The latter's source in C is very simple, here it is:

DEFUN ("special-variable-p", Fspecial_variable_p, Sspecial_variable_p, 1, 1, 0,
       doc: /* ...  */)
  (Lisp_Object symbol)
{
   CHECK_SYMBOL (symbol);
   return XSYMBOL (symbol)->declared_special ? Qt : Qnil;
}

So it just checks a declared_special bit field.

Second, when you evaluate "(defvar <var> <val>)", the variable <var> receives the property of being special, that is, any symbol you put in place of <var> gets that declared_special bit set.  For example:

ELISP> (defvar my-var 10)
my-var
ELISP> lexical-binding
t
ELISP> (let ((my-var 12))
         (lambda ()
           my-var))
(closure
 (t)
 nil my-var)

See that the closure hasn't captured my-var ?  In fact, let just checks whether my-var has been declared special or not, by looking into its declared_special field.  In this example, my-var is really a special variable, so let binds it the same way it did that in all versions of Emacs prior to 24.

As for byte compilation, the byte compiler behaves similarly: if it sees a "let" form binding a variable which has declared_special bit set, it generates the code which performs dynamic binding of that variable at runtime (with a byte code primitive named "varbind").  Otherwise the binding is lexical, and the code being generated just manipulates the stack storing values there, without any changes to the symbol-value slot of the variable being bound.

Value is not present

That's not all, though.  When you evaluate "(defvar <var>)" without specifying any value, defvar doesn't set the declared_special flag of <var>.

When interpreting Elisp code, defvar just remembers that <var> is special by putting it onto the alist stored in the internal variable Vinterpreter_internal_environment.  (This variable is not exposed to Elisp layer at all; it is just C variable.)  This alist is of the same form and semantics as closures' captured lexical environments mentioned above:  a cons cell element (VAR . VAL) means that VAR is a lexical variable in effect, having the value VAL.  When a plain symbol (non-cons) is encountered in the alist, it means that this symbol is a dynamically-bound variable, and should be dynamically bound by all subsequent lets.

So, now you understand how the interpreter works regarding lexical environments: it just tracks (with Vinterpreter_internal_environment) the environment during the evaluation process by augmenting it within let blocks and in defvar forms.  When the interpreter has to create a closure, it does so by giving it the current value of this Vinterpreter_internal_environment variable.  That value is what is captured by the closure.

Let's see something concrete:

ELISP> (setq x
             (let ((w 10))
               (defvar w)
               #'(lambda (y)
                   (+ w y))))
(closure
 (w
  (w . 10)
  t)
 (y)
 (+ w y))
ELISP> (funcall x 12)
22
ELISP> w
*** Eval error ***  Symbol's value as variable is void: w

The closure has captured a lexical environment "(w (w . 10) t)".  We'll examine it from end to beginning:

  1. t -- for some reason, all closures contain t there.  It says that the variable "t" is dynamic (not captured).  This is by all evidence just an implementational technique chosen by people who worked on this.  You can simply pay no attention at those t symbols captured.
  2. (w . 10) -- this was pushed onto Vinterpreter_internal_environment by let.
  3. w -- this was added by defvar.  This entry does not shadow the previous one, and the resulting closure captures "w" with the value of 10.
Now see what happens if we move defvar outside of let:

ELISP> (progn
         (defvar w)
         (let ((w 10))
           #'(lambda (y)
               (+ w y))))
(closure
 (w t)
 (y)
 (+ w y))
ELISP> (funcall (progn
                  (defvar w)
                  (let ((w 10))
                    #'(lambda (y)
                        (+ w y))))
                10)
*** Eval error ***  Symbol's value as variable is void: w

In this case, the closure's captured environment doesn't have "(w . 10)" at all, since by the moment let is evaluated, the interpreter is already aware that "w" has been declared dynamic (by appearing inside Vinterpreter_internal_environment as a plain symbol). 

Valueless defvar and byte compilation

The byte compiler also treats valueless defvars specially in some cases.  But don't expect it to provide the same level of convenience as Common Lisp does.  After all, that functionality has not yet been released, and Elisp is not a general-purpose language.

To catch the point, let's start with examples:

(defun make-multiplier ()
  (defvar m1-x)
  (let ((m1-x (buffer-size)))
    #'(lambda (m2)
        (* m1-x m2))))

This is compiled into:

byte code for make-multiplier:
  doc:   ...
  args: 0
0    constant  buffer-size
1    call      0
2    varbind   m1-x
3    constant  make-byte-code
4    constant  257
5    constant  "\300 _\207"
6    constant  vconcat
7    constant  vector
8    varref    m1-x
9    call      1
10    constant  []
11    call      2
12    constant  3
13    constant  "\n\n(fn M2)"
14    call      5
15    unbind    1
16    return     

So, when a variable has been declared special with defvar and then bound with let, what we end up with is a dynamic binding ("varbind m1-x").  This behavior is the same as in interpreting case.


Now try to evaluate (M-:) this:


(disassemble (make-multiplier))



You get something like:

byte code:
  doc:   ...
  args: 257
0    constant  349
1    stack-ref 1
2    mult     
3    return


This means that m1-x was captured by the closure resulting from make-multiplier, although it was bound dynamically with the enclosing let.  This behavior is contrary to that in interpreting case.  N'est-ce pas ?


Putting "(defvar m1-x)" at the beginning of lambda body doesn't change anything, nor helps inserting "(defvar m1-x)" at the beginning of let.  In all cases m1-x is captured as a lexical variable.


The conclusion is this: when the byte compiler figures out whether to capture an enclosing variable, it considers only its declared_special attribute.  Try to place a defvar with some value specified, and you will see that m1-x is not captured as lexical any more.

 

So how to use all that ?  Conclusion

I would recommend everybody to use a couple of approaches in dealing with scoping stuff:


1. If you don't need lexical binding in a particular file (=library), just don't use it and that's all.  Otherwise, see item 2.


2. If you do need lexical binding, then add a file-local variable "lexical-binding" (use "add-file-local-variable-prop-line" for that).  You'll now have lexical scoping in effect by default.


3. If you want to create a (possibly internal within your library) global variable, feel free to do this at file level with defvar.  Your variable will be dynamic throughout your Emacs system: it won't be captured by closures and let-bindings of it will act as dynamic ones.  The same is true about defconst, by the way.


4. Previous 3 items cover most of typical Elisp programming cases and needs.  If you want to be able to use something like:


(let ((y 10))
  (declare (special y))
  ...)

as you would do in Common Lisp, and get equivalent behavior and sematics, then just forget about this and look for other ways around :)  Keep in mind that defvar won't do the same job for Elisp as the "special" declaration currently does for CL; defvar is much less powerful.

Friday, March 16, 2012

Introduction



1. What is this blog about ?
2. Elisp is fun.
3. Elisp is a specialized Lisp dialect.
4. Conclusions and plans for the future.

What is this blog about ?

In this blog, I intend to share my experience in Emacs Lisp programming and Emacs scripting, as well as some thoughts, critics and ideas about the art of programming in general.

I assume the audience of the blog to be, first of all, Common Lisp gurus and Emacs (Lisp) advanced users.  But beginners are by no means discouraged from
reading my posts and leaving comments on them; in fact, I don't consider myself to be an advanced lisper (rather quite far from being so).  The reason I said the above is that it just seems to me that things I'm going to post here can be classified as "advanced Emacs Lisp programming", in comparison to what is typically done in this language.


Elisp is fun

First of all, why Emacs Lisp (Elisp) and what is so special about that language ?

I do almost all of my spare time programming in Elisp inside GNU Emacs text editor, and it is extremely interesting and exciting.  Major part of the fascination comes from the nature of the language -- it is a Lisp dialect, therefore, it shares the same principal trait with all the Lisp family languages.  Namely, this trait is the ability to write code that writes code, which naturally leads to the possibility of creating any syntax and using it, without a necessity to reinvent another compiler for that, and without ever being stuck to any particular predefined language tools (which is pretty sure the case with almost all the other programming languages).

But Elisp is not so enjoyable solely due to the fact that it is a Lisp dialect. The environment is also perfect; first, Emacs is itself written in C and Elisp, and all the facilities it affords are implemented in Elisp themselves (barring primitives, which are implemented in C).  This makes Emacs perfectly unified and consistent.  Second, it is fully interactive, meaning that one can use REPL, run arbitrary pieces of code directly from the editor, have functions in Elisp defined and compiled and call them right away with a single keypress.  You will never have to wait for half an hour for your project to be built, for the sake of calling a single function 1 time.

In Elisp, like in Common Lisp or Scheme, there's no place for any routine work, boring coding, copy-pasting or boilerplate code.  But the space for creative activity, inventing, discovering, writing and rewriting is unlimited.

Elisp is a specialized Lisp dialect

I believe that Elisp is one of the most underestimated programming languages ever devised by humans (the other being JavaScript; but leave that for now).  It is perfect for various text-processing tasks, since it contains certain primitive facilities aimed to be used specially for that.  In the fields of scripting and text processing, Elisp is perfectly able to compete with Perl or Python, with a difference that all Elisp-based facilities can be seamlessly integrated into Emacs environment and be used interactively.

So, Elisp is a special Lisp dialect that was invented to be an intrinsic and inseparable part of a single particular application.  But this fact doesn't prevent it from being excellently usable for some limited types of general-purpose programming.  I strongly believe this, and I want to popularize Elisp by sharing my experience in using it for miscellaneous programming tasks.  Elisp should evolve and become better; it is worth putting endeavors in its evolution.


Conclusions and plans for the future

As a conclusion, I want to mention some concrete topics I plan to devote my future posts to.

First, one of the most actual thing concerning Elisp at the moment of writing is introducing of lexical scoping in the language.  Presently, it is already available in alpha version 24.0.94.  Full acceptance of lexical scoping seems to open new horizons to Elisp programmers and to change the way Elisp code was being hacked before.

Second, it would be interesting to write about the experience of creating algorithms for simple games like Sudoku or Paint-by-numbers (Japanese crosswords).  Those involve many interesting questions.

I hope to come up with the second post during the next couple of weeks.

Good luck!