Delayed Publication Publication & Postmortem & Potential Reanimation: AutoCLaracterization Library

Introduction

Last summer – that is, 2023 – I wrote version 0.1 of AutoCLaracterization! “Automatic characterization tests in Common Lisp (targeting 5am). Or almost automatic, at any rate.” For more information about AutoCLaracterization as-is, go to the linked page. For more information about the development of it – read on! If you’re a human reader experiencing a sense of déjà vu at the format of this introductory paragraph… Well, that’s the joke I just ruined by pointing out.

Project Conception & Raison d’Être

Being handed the maintenance of an existing system at the time, I wisely decided to get the system under testing to guard against regressions, and hence started by writing an automatic characterization test generator. This led to a 93% speed-up in achieving 95% test coverage of the handed-over code, if we discount the initial time spent developing it, further leading to – nah, I’m just yanking your chain.

During dinner with a colleague in Summer 2023 during an on-site at work, he pointed out that the lack of unit tests et al. in some of the systems he was maintaining made for fiddly progress. After all, hard to catch regressions – especially early – without tests. I teasingly responded that per Michael Feathers’ “Working effectively with legacy code”, which I knew my colleague was familiar with, the first thing to do when taking over a codebase (unless it happens to be well-tested already) is to get it under characterization testing. And since characterization tests simply describe the behaviour of the system, surely my colleague could simply write a program to program the test coverage for him. Lisp offers great program-writing program programming capacities, after all. My colleague, naturally, took some minor playful offence at the suggestion, and talked a bit about not liking macros overmuch as they made code hard to follow. A pleasant dinner conversation with light banter, in short.

To my surprise, I couldn’t get the idea out of my head. It should be possible to rewrite some choice macros – namely the ones wrapping all typical code, i.e. defun, defgeneric, defmethod – and make a system characterize itself. After a couple days unable to get rid of the idea, I gave in and sorta feverishly developed the thing. Took a bit less than three weeks from start to finish. Three delightful weeks, I might add, as I was basically in feverish flow the entire time, pouring my free time into it.

Project Order

I started out by delving right into the main meat of the AutoCLaracterization library. That is, by implementing defrecfun as a macro with the same interface as defun – but which, additionally, would instrument things so the function calls would punt test forms into storage somewhere. The idea was to instrument a codebase by going in and exchanging choice defun forms with defrecfun ones. Simple and easy, even if a wee bit of hassle. So since the resultant function (after defrecfun instrumentation) would have to generate invocation forms of itself, I wrote a function to generate the forms necessary for it, with the cute name of generate-generate-invocation-form. I suppose I missed out for not going whole hog with generate-generate-invocation-form-form, really. Oh, and I had to do quite the deep dive into the various possible combinations of lambda-list-keywords to make sure the instrumentation was good.

Since the instrumented functions were to generate testing code, that meant serialization to Lisp code — something SPICLUM also deals with. So I copy-pasted the serialization code over – what a delight, problem solved to the best of my ability (since the stuff SPICLUM can’t serialize is the stuff that lacks ready introspectability). Once defrecfun seemed to work, I added defrecgeneric and defrecmethod (well, after (normative!) tests, mind).

In a sense, that more or less completed version 0.0, which only had the macro approach. However, I had also been discussing with another colleague, who mentioned that he wasn’t too fond of the idea of having to go into the code base and change various forms, since it’s fiddly. How can you save the instrumented forms if you usually want them uninstrumented, for example? So he recommended I check out *macroexpand-hook* and looked into making the instrumentation live in its own code files, because that way it would be easy to reuse etc. He convinced me of the superiority of that approach, and hence I set about implementing that for version 0.1 as it were. *macroexpand-hook* is pretty neat, and offers a way to seize control over how macroexpansion works – in my case, I wanted to conditionally reroute defun/defgeneric/defmethod into defrecfun/defrecgeneric/defrecmethod as relevant. There was a minor fiddly hiccup in that I also needed to keep track of the few previously seen forms so as to avoid infinite loops, but overall implementation of that rerouting was relatively painless.

And once the approach using the recorder forms for instrumentation through, implicitly, *macroexpand-hook* was done, the rest of the project time was basically spent writing tests, documentation, and on some bugfixi.

Project Postmorten

The entire development – including documentation – took place between June 3 and June 22. I imagine I poured all my free time into it at the time. It’s rather a brutal graph:

GitHub graph of code frequency over the history of arthev/autoCLaracterization, showing the apx. 3 week project cycle, and the very spiky behaviour of the overall LOC volume

In particular, it’s 39 commits, 2,833 LOC additions, and 695 LOC deletions. That’s about 150 LOC additions per day on average (though the graph above does look spiky) – surprisingly high for a metaprogram that felt, well, complex to deal with and used a mechanism I hadn’t touched before (*macroexpand-hook*). Not like writing straightforwards db queries where you fetch some data and do some joins and some filtering and suddenly have 80 LOCs and wonder why something that simple takes so much space.

The LOC cost allocation is as follows: 939 LOCs on test code, 954 LOCs of program code, and 214 on documentation. The testing code is a bit funny: Since AutoCLaracterization exists to generate 5am tests, and the testing code uses 5am, it’s 5am tests of whether certain forms result in 5am code forms that look as expected… And since the action happens as a result of macro definitions, there are e.g. helpers like load-defrec that take a form and turn it into the appropriate recorder macro version and then eval‘s the resultant macro form… What else is there to do?

Did the project succeed? Sure. The main intended project outcome was to give me peace of mind by getting the idea out of my head and into the computer so I’d have space to think again. Mission accomplished.

Successes

Since the thing works, I now have a library to automatically put a Common Lisp codebase under characterization testing. Obviously, there are caveats. The design tests the functional interface resulting from defun/defgeneric/defmethod, but that means the library doesn’t test side-effects, which is – depending on the codebase – a significant shortcoming.

Further, though I mentioned above that the test code for the library is rather odd given the need to evaluate macro forms during the test, I found a set of helpful utilities that made the writing of it rather smooth as I recall. Also despite being rather the number of LOCs (especially as a ratio relative to the actual library LOCs themselves), I think that’s mostly since the comparison values simply take space due to being code forms themselves.

The initial implementation of the defrecfun/defrecgeneric/defrecmethod macros was also pleasant. Most things are pleasant when you can create something in a state of feverish flow, though. They also seem pretty clean to me – there’s a generate-function-body helper function that does much of the heavy lifting. Abstracted out of the initial defrecfun macro, no doubt, but the fact that defrecgeneric uses that same function for most of the heavy lifting, and then defrecmethod uses defrecgeneric for its heavy lifting… Well, it seems neat and tidy to me! Instead of hardcoding the sort of tests to generate, or the strategy of handling instrumented function calls inside instrumented functions, I made those choices part of the library interface and simply chose what seemed the sanest defaults to me.

Obviously, being able to simply snarf my serialization code from the SPICLUM library and repurpose it with one quick copy-paste was nice. It implies I could consider separating the serialization code out as its own mini-library though, and just make it a dependency for SPICLUM and AutoCLaracterization both. We’ll see – certainly, if I end up adding new serializations that make sense in both contexts, it would seem extra tempting.

Another point worth noting: To instrument generic functions, I decided I wanted an around method that would handle the instrumentation. But of course, that might collide with existing around methods. That was a headscratcher for a bit before I realized that what I wanted was a superaround method that went around everything! Too bad there aren’t any such things in the Lisp standard. Sike!

(define-method-combination superstandard ()
        ((around (:around))
         (superaround (:superaround))
         (before (:before))
         (primary () :required t)
         (after (:after) :order :most-specific-last))
  "SUPERSTANDARD method combination to wrap a single :superaround around an
otherwise standard method combination using generic. Hence instrumentive behaviour
can be added without modifying any existing DEFMETHOD forms. Balks if multiple
:superaround exist for a given generic function."
  ...)

I suppose that means there’s an implicit restriction of using defrecgeneric for generic functions with standard method combinations. Could be expanded to handle other method combinations on a case-by-case basis by adding further method-combinations and mapping between X and superX, but hey, I’m not an oracle. As a side note, I think this was my first time writing a new method-combination.

The biggest successes were the conversations with my colleagues, however. Both the initial seed conversation, with the teasing banter leading to an interesting idea. Serendipity and all that jazz. But also the subsequent conversations with the other colleague, getting the critique of the design and the recommendation of putting *macroexpand-hook* to use. I think it’s led to a much niftier overall interface, and the approach lets you store the instrumentation forms, which is more maintainable than constant jiggly modifications of the codebase. It’d be remiss not to mention that the mild exasperation of the original banter colleague when I informed him of my progress on the implementation also served as a small motivating aspect. Overall, I suppose it’s a situation analogous to Richard Hamming’s comment in You and Your Research: “I noticed the following facts about people who work with the door open or the door closed. I notice that if you have the door to your office closed, you get more work done today and tomorrow, and you are more productive than most. But 10 years later somehow you don’t quite know what problems are worth working on; all the hard work you do is sort of tangential in importance. He who works with the door open gets all kinds of interruptions, but he also occasionally gets clues as to what the world is and what might be important.” Which is, of course, an instantiation of the old proverb: “Iron sharpens iron.”

Failures

AutoCLaracterization has two obvious big shortcomings: For one, it inherits all the serialization shortcomings from SPICLUM. Secondly, it doesn’t handle side-effects and instead solely characterizes functional interfaces.

Further, since the approach AutoCLaracterization takes is to essentially redirect certain macro invocations into other ones, it requires recompiling the entire system under instrumentation once the image has been instrumented with the register-recorder invocations. This recompilation is fine for the sort of use cases I have, but it might annoy someone else. In particular, it means you can’t instrument something you don’t have the source code for.

The funniest failure is that I developed the library at home, in my free time, using SBCL. I then started verifying it works in Allegro at the end of development, only to then find out Allegro shortcuts macroexpansion for a whole bunch of standard Lisp macros. Quoting from https://franz.com/support/documentation/10.1/doc/implementation.htm: “The value of macroexpand-hook is coerced to type function before being called, and therefore may be a symbol, function, or lambda expression. Allegro CL has always permitted these but macros and symbol macros expanded directly by the compiler (and not indirectly by other macros) don’t go through macroexpand-1 and consequently don’t invoke macroexpand-hook. While the specification is somewhat ambiguous, this should probably be considered a bug.” I am tempted to contact support and ask whether something could be done about this (not necessarily affecting default behaviour, but optionally permitting the *macroexpand-hook* approach to arbitrary instrumentation would be neat). Of course, even if not, there are ways to force the desired behaviour – I have in mind the reader-using alternative I discuss below.

Summa Summarum

The project was a blast: I learnt about *macroexpand-hook* and wrote a first method-combination and dealt with testing something macro-driven and had interesting conversations with my colleagues. Further, since test-driven development is a whole fancy process methodology, I suppose this project might technically form the seed of a counterpoint process methodology: banter-driven development. Other nice but alternative process methodologies of course include what some professor told me and some other people in a mostly machine learning-oriented conference I attended in 2018: He had been writing something in Prolog, and it didn’t work, and he just couldn’t figure out what was wrong and started – in anger – deleting stuff at random, venting. And then he suddenly stopped and peered at the screen and ran the remaining program, and it worked as it was supposed to to start with. So that was his one instance of rage-driven development. Probably not the best methodology, but hey, worked for him.

I haven’t put AutoCLaracterization to work yet – it was basically written as an oddball mix between ‘practical joke’, ‘serious effort at making something interesting’ and ‘feverish obsession’, so it’s not like I had an explicit use-case in mind to start with either. However, I’ve recently – over the last N months – been handed some systems that are largely functional and might be amenable, so I am on the lookout for actually trying the thing out.

I suspect everyone’s familiar with this pottery anecdote/parable at this point, but given that mental model, or, for that matter, the model of ‘create tight feedback loop’ that underpins deliberate practice: It’s nice to have another library under my belt. I also had a laugh in a team meeting perhaps a week after I presented the library at work (as a ‘hey, I made this thing I think’s interesting’ thing) – when the colleague I had originally bantered with complained started complaining about something (totally different), and a colleague (not the one who’d recommended I look into *macroexpand-hook*) joked that the complaining colleague should be careful, or I might just end up going off and writing another inane library. High praise, in my book. I know I’ve read some decent resource arguing for how the act of going through full projects is crucial for learning/excellence/mastery, what have you, but I can’t remember where/what/which. Probably a mental amalgamation of reading too many chestnuts on that topic.

Now, one of the pieces of feedback I got when I presented AutoCLaracterization more broadly to multiple colleagues at once was multiple other plausible approaches. One of those is to use the advice facility Allegro has. Would probably work, though I haven’t looked into the particulars. I do like the current implementation of AutoCLaracterization, after all, and I in part like it for using general mechanisms. I suppose those mechanisms are the sort you could use to implement an advice mechanism if you really wanted to, though. Another plausible implementation approach would’ve been to double down on the macro approach, at the basic level, by shadowing defun et al. Would’ve worked obviously – sort of an analogue to the external instrumentation approach, and would’ve needed some syntactic marker of which functions to instrument or not. The third alternate approach – and this is one in part spurred by the observation that Allegro cheats by bypassing *macroexpand-hook* – would be to intercept forms at read-time rather than at macroexpansion time. Hell, add *read-hook* through injection in each read form if you want to. Anyhow, the implementation can’t cheat at macroexpansion time if the form gets quietly converted to be an invocation to a different macro prior to macroexpansion…

Finally, we come to the ‘future work’ section. One of the major caveats of AutoCLaracterization is that it doesn’t instrument for side-effects. I don’t think that’s a necessary constraint, it’s just accidental to what my initial ideas were and how far I wanted to take them (primarily out of head and into silicone, also known as one desk-distance). Sure, in a multi-threaded context, instrumenting side-effects probably becomes non-deterministic garbage. But under assumptions of (local) sequentialness… Well, if we could store the value of e.g. affected globals prior to the first instrumented modification, and after the whole lot of them for the instrumented function call stack (or each individual function call, design decision), then that gives you automatic characterization of side-effects. I guess things like file interactions might be characterizable as well. Now, AutoCLaracterization works by rerouting a select few macros for extra instrumentive action… and the prime observation is that setf is a macro. Tempting, tempting…

Big thanks to all of my colleagues mentioned above. I’m sure you can place yourselves. But why are you reading my blog?