Learning Vectors: Up and Down

An image of archaeological stratigraphy.

— a Meditation

Abstract

A meditation that was intended to be about learning vectors (up and down) by contrasting SICP and NAND2TETRIS, but which instead relegated them as mere side notes in favour of pondering learning vectors and implications in general (for CS).

Intro

I have been wondering about the differences between top-down and bottom-up learning lately: How do they compare? Are there bounds, things you can only learn from one style? Are certain things easier to learn with a specific approach? Do domains differ in the suitability of either learning strategy? What differences are there in the resultant perspectives of a downwards or upwards learning journey?

A preliminary observation: Learning can be done in ways that are neither top-down nor bottom-up – e.g. adjacent learning, where subtopics are (loosely) related without building on each other in either direction (grammar?); gradual refinement, where learning improves the performance of some action (walking?); piecemeal learning where there’s no vector whatsoever.

Anyhow, our main concern here is the comparison between top-down and bottom-up learning. Mathematics is typically taught bottom-up: start with natural numbers and addition, introduce more number classes and operators, abstract numbers into algebra, abstract operators into functions, introduce calculus as operations on functions. Maybe introduce some geometry, set theory or probability along the way.

In comparison, programming is typically taught in an adjacent learning style: first imperative programming, then OOP, maybe algorithms & data structures, maybe some computer architecture. A lot of applications (e.g. databases, networks), and a lot of practical projects and so-called software engineering. Significant time is typically spent on learning library usage. It’s fairly piecemeal. The course structure of universities might be partly to blame: anecdotally, I note that the professor I had in Celtic literature used to complain about how universities had moved from multi-semester topics/fields to self-contained one-semestre courses.

Top-Down

Let P be a program in a Turing-complete high-level language L, hand-written on paper. Note that P is a program despite not being on a computer. So P is an abstract symbol manipulator, expressed as a potentially elaborate formula from some formal system L. L has both syntax, describing legal expressions, and semantics, describing the meaning of those expressions. Since L is Turing-complete, it can (in theory) be used to specify any possible symbol manipulation. Generally, the first step of learning programming is to learn one such language and practice utilizing it – that is, to gain skill in human symbol manipulation. Now, suppose we have written P for some practical endeavour: we would like to execute the program. How can we do that?

Well, we can feed P into a computer, which then turns the abstract symbol manipulator into a concrete one. How does the computer turn P from abstract to concrete symbol manipulator? It’s not enough to simply transcribe P into a text file somewhere. No, we need a different program to convert our program: either a compiler or an interpreter. Say we have a compiler. Then, in order for our computer to run P, the compiler translates P from L to some other language L’ – typically L’ is an assembly language. Then we can invoke a separate program, the assembler, to translate from L’ to L’’ – the machine language.

Now we can note two interesting things: 1) Compilers and interpreters are programs, which means they have to be converted into machine language for execution at some point too. So, by necessity, there has been a “bootstrapping” process where a compiler or interpreter was writ in machine language. 2) We have gone from one interesting question (“How can a computer execute a program?”) to three: “How does a compiler do its translation work?”, “How does an interpreter interpret a program?” and “Suppose we have a program in L’’ – how does the computer execute that?”.

Briefly, a compiler translates P from L to L’ by mechanically working through the expressions in P and constructing (groups of) semantically equivalent expressions in L’ (for some meaning of semantically equivalent). Interpreters are programs that simulate abstract machines that can execute language L. Perhaps compilers and interpreters are typically relegated to advanced courses due to the meta-manipulation involved: writing one requires writing not just an abstract symbol manipulator, but an abstract symbol manipulator to manipulate abstract symbol manipulators.

How does the computer execute P in L’’? Here the OS typically comes in as an extra layer, with its job of coordinating different programs running on the same computer. Assuming the computer does not include microcode, instructions in L’’ corresponds directly to the electrical signals to be sent through the computer for a given program. So, an interpreter is an abstract machine that accepts some language L. A physical computer is a physical machine that accepts some language L’’. Computer architecture as a field is about how to design these physical machines effectively and efficiently. Typically, a physical general-purpose computer is made as a Von Neumann machine (also known as the machines Von Neumann kinda threw together just to get something working to play with). Memory and storage, CPU, I/O…

How are the pieces of the physical general-purpose Von Neumann machine made? That’s the field of digital circuit design, which are basically large-scale functions in Boolean logic. So digital circuit design is math, expressed either as Boolean logic or through modeling with logic gates. How are logic gates made? That crosses over to electronics – the short answer is “by making little electronic circuits that try to force outputs either high or low, without in-between signal values”. However, semiconductor electronics is just one implementation of those logic gates. You can theoretically make them with, say, telegraph relays.

And by probing deeper, you can probe through electric engineering and into particle physics and then into quantum mechanics. (And then, perhaps, build up again to quantum computers instead? Hmm.)

Bottom-Up

What is CS? A riveting question. Let’s skip it. Instead, let’s consider the following quote by Scott Aaronson:

“Isn’t physics the accepted academic path to understanding the universe? Well, physicists have what you might call a top-down approach: you look for regularities and try to encapsulate them as general laws, and explain those laws as deeper laws…

CS you can think of as working in the opposite direction. (Maybe we’ll eventually meet the physicists half-way.) We start with the simplest possible systems, and sets of rules that we haven’t necessarily confirmed by experiment, but which we just suppose are true, and then ask what sort of complex systems we can and cannot build.”

Suppose you want to make a general-purpose computer – what minimal starting point is, theoretically, sufficient? One answer is: a NAND-gate. From the NAND-gate you can make a full set of digital logic gates, and from there build the different parts of a computer. Once you have the computer, you can in theory bootstrap an assembler and a compiler/interpreter, and then you’re back in the high-level programming world. Of course, building everything out of pieces built from NAND-gates will (probably) in practice introduce too much noise to actually work. But if it works logically, and if the different NAND-built parts were to be replaced with more optimized implementations so that the real-world engineering constraints were taken care of, the end result will work as well.

However…

A top-down approach starts with a high-level language. A bottom-up approach starts, perhaps at the Boolean logic and digital circuit design, or perhaps at the computer architecture level, or perhaps at the electronics level, and then builds up towards a general-purpose computer, and then through to the OS, compilers/interpreters and then the high-level language platform. It seems to me that if the initial learning is done in a bottom-up way, there is the risk of keeping the execution model firmly in mind while learning the programming (so that, rather than a motivating tour de force of system building, the bottom-up journey might instead create a drag.

This is in relation to the following Dijkstra quote:

“[Operational reasoning about programming is “a tremendous waste of mental effort”.] Not everybody understands this sufficiently well. I was recently exposed to a demonstration of what was pretended to be educational software for an introductory programming course. With its “visualizations” on the screen it was such an obvious case of curriculum infantilization that its author should be cited for “contempt” of the student body”, but this was only a minor offense compared with what the visualizations were used for: they were used to display all sorts of features of computations evolving under control of the student’s program! The system highlighted precisely what the student has to learn to ignore, it reinforced precisely what the student has to unlearn. Since breaking out of bad habits, rather than acquiring new ones, is the toughest part of learning, we must expect from that system permanent mental damage for most students exposed to it.”

This relates to earlier comments in the same text, where Dijkstra notes that the way to reason about programs, or rather, formal systems in general, is to reason about them in terms of themselves. That is, the way to reason about P in L is to reason about P in L using L’s syntax and semantics, not to envision P’s possible execution under some particular execution model (like your computer). Colloquially, by reasoning about programs as if they were magic incantations with well-specified and precise meaning, rather than by how computers happen to execute them, since CS is no more about computers than astronomy is about telescopes!

Verdict

Perhaps it’s best to postpone a bottom-up tour until later, once someone has learnt how to program and is already familiar with reasoning about programs as programs. Taking a top-down trip from language to compiler or interpreter fairly early seems useful to drive home the point that a language implementation is just another program – it seems an odd divide to artificially constrain yourself into only using tools and systems others devise for you, after all. Besides the autotelic enjoyment of grasping, say, an interpreter, there’s the pragmatic question: What if you need something someone hasn’t made? What if you aren’t used to looking for the holes in the solution space, and therefore can’t then, at the appropriate time, realize the use for another formal system?

SICP, as I have mentioned before, is a top-down tour from functional programming through minimally-stateful programming through interpreters to a register machine emulator. Given that a full work-through takes on the order of hundreds of hours, it seems like ample time to learn to reason about programs prior to the register machine emulator making its appearance. PAPL seems to also be a programming introduction that discusses interpreters. In the opposite direction, NAND2TETRIS is a great project-based course that takes the student from a NAND-gate through assorted logic chips, to a working (simulated) computer, through assembly and an assembler, through a compiler to a small OS for a full-fledged simple computer environment. It seems to me that, if adequately prepared, taking a top-down/bottom-up trip immediately followed by a bottom-up/top-down trip might yield interesting perspective.

I do think it’s a worthwhile exercise to take a bottom-up trip to gain different perspective at an appropriate point in one’s learning journey. It’s probably useful for astronomers to know a little about the telescopes they happen to use too, after all. The question then becomes: When is the appropriate point? Perhaps at a suitably late point so that the incidental computer studied can be viewed as a member of the class of objects that can execute a program, rather than as a model to base your thinking about programs on.

At this point, I can think of an obvious objection to the discussion so far: “If learning about a computer should be postponed so as to not muddle the student’s reasoning, shouldn’t learning about interpreters be postponed as well? After all, they are implementations of abstract machines that accept a given language. So there’s the risk of starting to reason about programs in the language as executed by the interpreter.” I think this objection is fair, and don’t have a good answer.

I would hope that the risk is mitigated partly by seeing clearly that the interpreter is also a program, and that the risk might be smaller to begin with because, the interpreter being a program, it doesn’t have the same psychological baggage of realness to it as a physical computer which actually happens to execute programs. That is, by following the show of an interpreter with: “Now, you made this interpreter as a way to run programs in language L: Is there any reason to suppose this is how the programs must be executed? Could you have implemented the interpreter differently? Would it actually behove to drag this whole program into your working memory and then reason by executing your programs mentally, when you could reason about the programs as programs instead?”