Form and Content in Computer Science

Recorded by Eric Normand. Published: February 8, 2021. Updated: February 5, 2021.

This is an episode of The Eric Normand Podcast.

Subscribe: RSS Apple Podcasts Google Play Overcast

In this episode, I excerpt and discuss the 1969 ACM Turing Award Lecture by Marvin Minsky.

Transcript

Compilers are not concerned enough with the meanings of expressions, assertions and descriptions. The use of context-free grammars for describing fragments of languages led to important advances in uniformity, both in specification and in implementation. But although this works well in simple cases, attempts to use it may be retarding development in more complicated areas. Hello, my name is Eric Normand, and today we are reading the 1969 Touring Award Lecture, given by Marvin Minsky. Now just for context, I like to read the biography. They have a very short one. This one's about two printed pages. I look at when he was born, 1927, and he got the award in 1969. So he was 42 years old. Now I bring that up because I think that that's less than the previous Touring Award Laureates. So let me look that up. I'm going to just look up the ages of the others. Let's say the one who came before him. So I'm looking up, Alan Pirlis was born in 1922. Okay, so he was five years older. And he got the award three years before. Maurice Wilkes, 1913. Okay, so that's 14 years old, younger or older. And then Richard Hamming, 1915. So that's 12 years older. Okay, I was just checking. I mean, he was young, 42. I say that, you know, young. I'm 39 right now. And I couldn't imagine getting anything as prestigious as a Touring Award in the next three years. So there you go. That's quite an achievement, I think. And, well, when I read these, I've said this before, but I often like to have a question in mind. Like, what is this? What can this paper teach me? What is this paper about? Often there's so much going on in it. Something to help focus the sort of the research problem. Like, what can I learn from this? I'd like to ask a question. And this one was kind of hard because it's kind of rambling. It kind of goes all over the place. So I gave it a good try, and this is what I came up with. Okay, this is kind of the thesis, right? So this is in Marvin Minsky's voice. I got this award, and you didn't. I must see things differently. And so I'm going to try to explain what I see and why and why you're wrong. Okay, maybe that's not generous enough. But when I've watched talks that he's given in interviews, that's kind of the feeling I get. And sometimes they'll ask him, like, what did you think growing up? And he'll say something like, I thought I was slow, but then I thought everyone else was much slower. You know, something like that. So, you know, a smart person, I can see how a smart person who wasn't trying to, you know, be polite would say something like that. And I think that you get this award and you get this special opportunity where you're not generally recognized or people in general might totally ignore you or at best just disagree with you and dismiss you. And how you get this opportunity where you've been recognized and you have an audience. And so there's, you can say, look, I've done something important. And let me tell you about it. Normally you're not listening to me, but if you're going to give me the platform, this is what I'm going to tell you. Okay. In my mind, Marvin Minsky is the kind of, like, legendary person that everyone sort of looks up to and has this respect for, but then they don't follow in his footsteps, let's say. I'll put it that way. And so, before I get into the paper with that kind of focus on it, like, what can we learn about how he sees the world differently that allowed him to get the award at 42. And what does he see as the future of computation? So let me just go over a few points in this biography. So he is cited for his central role in creating, shaping, promoting and advancing the field of artificial intelligence. So this was done in 1969. So there's an implicit, an implicit assertion that artificial intelligence is at that time is a worthy thing. But of course it doesn't really define it. So in the text talks about his research includes important contributions to cognitive psychology, neural networks, automata theory, symbolic mathematics, and especially artificial intelligence, including work on learning, knowledge representation, common sense reasoning, computer vision, and robot manipulation. He also made important contributions to graphics and microscope technology. Okay, definitely, like, brilliant mind here, just the far reaches of his interests and the stuff that he was able to do. Definitely was one of the founders of the artificial intelligence movement. He was there at that initial symposium where they define the term and define the field. Like, what is the purpose of artificial intelligence? What are we actually after? Also, he talks about not in this, but in other interviews and stuff. He talks about how lucky he was to have great teachers and be living at a time when you'd visit a university and you'd get to meet Einstein and Feynman and, you know, all these like Claude Shannon, you know, all these people that are now legendary as a student. He was soaking all of that in, which I'm jealous of. Like, I just have to say, I envy that. I never had that. Okay, so the lecture was given in 1969, but it was not published until 1970. It's called form and content in computer science. Now, if you're not familiar with these, I don't read everything on the recording. I just excerpt it. The stuff that I want to highlight. And so there's a lot of editorial. I'm leaving a ton of stuff out. I'm just trying to make it reasonable length. Like, if you really want to read it, you should just go read it. But I'm going to excerpt it. I'll read the excerpt and I'll make comments. And often I'll summarize stuff that is important, but, you know, I don't want to read the whole thing. Okay. Form and content in computer science by Marvin Minsky. The trouble with computer science today is an obsessive concern with form instead of content. That is the first line. And then he acknowledges that that's too strong, too strong a way to begin. And also, I puzzled over this. You know, it's easy to start picking apart the words. I've read it several times, the whole thing several times. And I guess I feel like this is one of those abstract ideas, like form versus content that happens when you're trying to deal with different subjects, but trying to kind of summarize it in one way. It just gets too abstract. It's really, I think it does disservice to the ideas in the paper. The idea of form versus content, really, throughout the whole thing, he's really just saying, we're focused on the wrong stuff as an industry, as a field, as an academic pursuit. We're focused on the wrong stuff. We're focused on this superficial stuff. We should get deeper. Okay, so this form versus content, you can see how it relates the form, like the shallow stuff, and the content is the deep stuff. But I've just thought about this so much and tried to really find, like, where is the form and where is the content? And it's such a philosophical debate about form versus content. I just want to warn you, as we go through it, he does kind of talk about form versus content a lot, but it's really like, he's really just saying we're looking at the wrong places. Okay, this essay has three parts, suggesting form content confusion in theory of computation in programming languages and in education. So there's three different places. The area of computation is one, programming languages is two, and in education. Okay, so it's three separate places, and he's trying to bring them all under this form content umbrella, but just disregard that. So he's going to go over each one and, like, where we're looking and where we should be looking. Okay, theory of computation. To build a theory, one needs to know a lot about the basic phenomena of the subject matter. We simply do not know enough about these in the theory of computation to teach the subject very abstractly. Instead, we ought to teach more about the particular examples we now understand thoroughly, and hope that from this, we will be able to guess to guess and prove more general principles. I think that many of our beliefs that seem to be common sense are false. We have bad misconceptions about the possible exchanges between time and memory, trade-offs between time and program complexity, software and hardware, digital analog circuits, serial and parallel computations, associative, and address memory, and so on. Okay, so there's a lot to unpack here, but he's basically saying we don't have any good theories in computation, theory of computation. We don't have what you would find in physics with different kinds of conservation laws. Well, let me read. It is instructive to consider the analogy with physics in which one can organize much of the basic knowledge as a collection of rather compact conservation laws. Okay, he gives some examples, conservation of energy. There's the quantum theory is like a trade-off between the certainties of position and momentum between time and energy. There is nothing extraordinary about this. Any equation with reasonably smooth solutions can be regarded as defining some kind of trade-off among its variable quantities. Okay, so he's saying like, it's simply just points that we understand physics well enough that we can just draw these curves out and talk about, you know, if you go up the curve, you're going to go down in this quantity and up in this quantity, but then if you go down, you're going to go down in this and up in that, and we can't do that in computation. There are many ways to formulate things, and it is risky to become too attached to one particular form or law and come to believe that it is the real basic principle. Okay, it's just a warning. And what he's talking about is, you know, you could formulate the Newtonian mechanics in terms of energy and think that that conservation of energy law is what's actually happening there. Right, it's the real principle. But it's just an equation, and it's just one way of seeing things. You know, you could also look at momentum and position, for instance. Okay. All right, the recognition of exchanges is often the conception of a science. If quantifying them is its birth. Oh, okay, I hate these kinds of metaphors, but it needs to be picked apart. So recognizing it comes before quantifying it. Right, so you can recognize that there's something here, there's some trade off, but we don't know exactly how to formulate it. What do we have in the computation field of this character? In the theory of recursive functions, we have the observation by Shannon that any Turing machine with q states in our symbols is equivalent to one with two states and n q r symbols, and to one with two symbols and n prime q r states where n and m prime are small numbers. Thus, the state symbol product q r has an almost invariant quality in classifying machines. Unfortunately, one cannot identify the product with a useful measure of machine complexity, because this, in turn, has a trade off with the complexity of the encoding process for the machines, and that trade off seems too inscrutable for useful application. Wow, there's so much to say about this. Okay, first, I would, I did not know that Shannon did this, called Shannon, and this makes this kind of citation, where it's just this casual observation by Shannon. This makes me feel like I've wasted my whole life. Like, of course, I should know about Shannon. I have a master's degree in computer science. Why don't I know this? But we just don't learn enough history in school, and we can't rely on our schooling to provide all the education we need. I mean, that's basically what I take from it. And that's why I'm reading these old papers and stuff, starting somewhere. Okay, so, but he's basically saying that if you look at this finding by Shannon, there's this q r, this product. And there's some, like, constant that you multiply it by, but that q r seems to be in both cases. But how do you, you know, how do you reconcile this into a law? And it seems like we can't really do it because encoding the, the, this touring machine is actually also a hard, intractable problem. And so maybe we shouldn't. Maybe there's nothing there. Okay, but I do like to bring these up. I considered skipping this section. I do like to bring these up because it really shows the kind of thinking that he's trying to get us to do. You know, if the question is like, what is, how is he different? And what does he see? You know, this is the kind of thing that you have to look at. He's actually trying to find something fundamental that we can start talking about. Let us consider a more elementary, but still puzzling trade off that between addition and multiplication. How many multiplications does it take to evaluate the three by three determinant? If we write out the expansion of six triple products, we need 12 multiplications. If we collect factors using the distributive law, this reduces to nine. What is the minimum number? And how does one prove it? In this and in the end by end case, the important point is not that we need the answer. It is that we do not know how to tell or prove that proposed answers are correct. One of our prime research goals should be to develop methods to prove that particular procedures are computationally minimal in various senses. I'm not sure whether we have this today, whether we can talk about this, you know, in the way he's suggesting this way of being able to prove that we have some minimal solution. Ordinary intuition has been wrong for a long time. So wrong that apparently no one looked for better methods. We still do not have a set of proof methods adequate for establishing exactly what is the minimum trade off exchange in the matrix case between multiplying and adding. The multiply add exchange may not seem vitally important in itself, but if we cannot thoroughly understand something so simple, we can expect serious trouble with anything more complicated. I want to go back for a second, where he talks about in the first paragraph of this section, we simply do not know enough about these in the theory of computation to teach the subject very abstractly. Instead, we ought to teach more about the particular examples we now understand thoroughly and hope that from this we will be able to guess and prove more general principles. He's very conscious that we are not physics yet, early in the field, and we don't have these kinds of laws that you can just tell the students just learn these. And once you get into grad school, then you can start doing novel research. We don't have that. We need to start teaching like, look at this problem. It's kind of an unsolved problem. If you have this, you have to do some matrix determinants, right? How do you know how many multiplications you need? How do we know if this is the minimal number you need? Could you find a solution that had fewer, or is there a trade-off between additions and multiplications? What's going on? We don't know. We don't know. We have no language for talking about this. And I believe he's saying that these are these kinds of fundamental questions that we're not preparing the computer science students to even be able to approach answering. And these are the kinds of questions that would lead to a better understanding of computation. Consider another trade-off that between memory size and computation time. In our book, Pappert and I have posed a simple question. This is Seymour Pappert that he worked with. Given an arbitrary collection of n-bit words, how many references to memory are required to tell which of those words are nearest in number of bits that agree to an arbitrary given word? Okay, so you got a bunch of bits sequences, n-bit words, so bytes or what have you. And you're given a new one, random one, which is the closest of all the ones in that collection to the one you have? Like how many bit flips do you have to do to get it? That's the distance. So which one is closest? Okay, simple problem. Very simple problem. It's like one of these kind of thought experiment problems where you construct this arbitrary problem that contains exactly, it's exactly hard enough that it gives you some interesting properties. It's very abstract. There's nothing particularly useful about this encoding of the problem. How must the memory size grow to achieve a given reduction in the number of memory references? Okay, so you're going to write a procedure for finding this closest match. And so the question is, how do you grow the memory? Okay, so he's going to talk about the extremes. If memory is large enough, only one reference is required. For we can use the question itself as address and store the answer in the register so addressed. Okay, you can just solve the problem for every single possible input, every single possible given byte string and a bit string, and then just lay it out. Use that bit string as the address. So the answer at address 0110, you just put the answer there. Okay, and then you just look it up, right? That's true. So you're trading, you know, up front time and space for instant, instant answering. Okay, but if memory is just large enough to store the information in the library, then one has to search all of it. Okay, so if you just have enough, let's say you have 10 bytes of data and you only have 10 bytes of memory, you have to go through it each time. And look, okay, but here's the thing. And we do not know any intermediate results of any value. Okay, what if you have twice the memory? What if you have 10 times the memory? We don't know. We don't have a good answer for how do you determine how many lookups are required. It is surely a fundamental theoretical problem of information retrieval, yet no one seems to have any idea about how to set a good lower bound on this basic trade off. And so we can't set the lower bound, but what we know that we can do it, we can get, we can do it, we can solve this problem, right, as programmers, we can be clever, and we can figure out a reasonable algorithm that uses a reasonable amount of space and time to figure this out. It's not the worst case and it's not the best case, but we can figure it out. So there must be something we're doing, but we don't have the language to talk about it. There is in today's computer science curricula, very little attention to what is known about such questions. Almost all their time is devoted to formal classifications of syntactic language types, defeatist unsolvability theories, folklore, about systems programming, and generally trivial fragments of optimization of logic design. Most courses treat the results of work and art of, oh, okay, so he's just, he's basically saying the curricula are not addressing the important questions, they're focused on all these trivial things, and they need to be dealing with the meat, like this is important stuff. Most courses treat the results of work and artificial intelligence, some now 15 years old, as a peripheral collection of special applications, whereas they in fact represent one of the largest bodies of empirical and theoretical exploration of real computational questions. Until all this preoccupation with form is replaced by attention to the substantial issues in computation, a young student might be well advised to avoid much of the computer science curricula. Learn to program, acquire as much mathematics and other science as you can, and study, sorry, here she can, and study the current literature and artificial intelligence complexity and optimization theories. Okay, so he brought up artificial intelligence, and I know before I mentioned that he probably felt dismissed, this is just my feeling where he and his colleagues in the artificial intelligence community had been publishing these very important findings, and it was, it had been dismissed, and he's stating that explicitly here. He says that they are treated as peripheral, a peripheral collection of special applications. Okay, so this is the kind of resentment that I think is coming out here, that they've been working on this stuff, and no one appreciates it. And, I mean, I tend to agree, I think that artificial intelligence has this curse where anything that they produce, anything that works in artificial intelligence is no longer called, is not considered artificial intelligence anymore, it's just like, oh, that's just good programming practice. And so then it seems like, you know, what has artificial intelligence ever done, right? And unfortunately, this curse has been since the beginning, I didn't know that it started so long before. Yeah, and I think that this focus on how do we solve these hard problems that are unanswered? I just have to agree, I think he's right, these are the kinds of things that will make computer science into a real science, you know, a real field, and give us deep insights into the kinds of things that the computer science can teach us about the world. Okay, I know there's some, I'm skipping ahead in my mind, I know there's some really cool people coming up that have perhaps dealt with this topic more fully than he's dealing with here. So we'll get to those, just stay tuned. Okay, so section two, it's about programming languages. So I already read this paragraph, I'll read it again. Compilers are not concerned enough with the meanings of expressions, assertions, and descriptions. The use of context-free grammars for describing fragments of languages led to important advances in uniformity, both in specification and in implementation. But although this works well in simple cases, attempts to use it may be retarding development in more complicated areas. There are serious problems in using grammars to describe self modifying or self extending languages that involve executing as well as specifying processes. One cannot describe syntactically, that is statically the valid expressions of a language that is changing. Okay, he's going to go deeper into this and he's just kind of stating the problem. It's very dense and hard to parse, but he will go deeper into it. There's this obsession with syntax and using grammars to specify a language as opposed to maybe more interesting and more important ideas of what the language actually means and what the language will allow you to express. It's not a syntactic concern. Again, it's this form content thing that he's trying to talk about. We're talking more about the form of the language, the syntax, and less about what it allows you to do with the language. Okay, computer languages of the future will be more concerned with goals and less with procedures specified by the programmer. I've heard this before. This is one of those things that comes up again and again in the kind of work I'm interested in, where people talk about how programming -- this is what happens. You make this new higher level of programming. You start with machine language, you go to assembly, and it feels like you're cheating for a little bit. "Oh, the computer's doing all this work for me. Look at all the stuff. I don't have to do myself." In assembly, you get to label the lines and jump to those lines, and the computer, the compiler, the assembler will calculate for you the offset. In machine code, you'd have to recalculate the offset every time you insert a new line, but just label it and jump to it. It's so semantically meaningful. I can put a real English word as the label. It feels like cheating. But then after a while, it doesn't feel so high level anymore. You're still -- okay, add these two numbers. Okay, store them over here. Okay, get this thing and that thing and multiply them and store it over there. You're really -- it's still low level. Okay, so then you invent C, let's say, and C feels -- oh, man, so high level. I can do if, then else, and I can do a for loop and look at all the stuff it's doing for me. Then after a while, you're like, "Yeah, this is pretty low level still." I get to name my variables, but I still feel like I'm just like, multiply this, move it over here, call this function. Like, it's a very mechanical feeling after a while, right? And so what's the next thing? Where do we go? Well, he's saying that computer languages of the future will be more concerned with goals and less with procedures specified by the programmer. What does that -- I mean, what would that even look like? Well, he's going to go into that. He's going to go into that. Okay, so subsection 2.1. Syntax is often unnecessary. One can survive with much less syntax than is generally realized. Please do not think that I am against the use at the human interface of such devices as infixes and operator precedence. They have their place, but their importance to computer science as a whole has been so exaggerated that it's beginning to corrupt the youth. He's not one to hold his tongue. Okay, so he describes square root in both this kind of language with infix operators and stuff and looks a lot like Python or something. And then he also defines it in a lisp. Okay, so here the function names in the lisp, the function names come immediately inside their parentheses, just a little intro to the syntax. The clumsiness for humans of writing all the parentheses is evident. The advantages of not having to learn all the conventions of operator precedence, et cetera, is often overlooked. Okay, this is a common argument about lisp. You know, yes, we know it's not that convenient to have to write the parentheses around plus every time, because we have these operator precedence rules. But the fact that you don't have to learn the operator precedence rules is an advantage. Okay, it remains to be seen whether a syntax with explicit delimiters is reactionary, or whether it is the wave of the future. It has important advantages for editing, interpreting, and for creation of programs by other programs, right? So this is something I've talked about elsewhere, where you often want two levels of design, right? You want to design a system for other computers to use, so a programmatic API. And you also want to provide a convenience layer for humans. And these often get conflated, or people prefer one or the other, and forget that both are important. The common example that always pops into my mind is SQL, the language, SQL. It's decent. It's designed for humans to write. It's decent for, like, going into a prompt, typing to a SQL command to the database and getting the answer, right? What is not so great at is from a programmatic standpoint. You often have to, like, concatenate strings and do all sorts of weird gymnastics to build the SQL that you want programmatically. You know, if you want to do a join, the join syntax is way different from the, like, just give me one table's data syntax. And so it just doesn't compose well. And so the mistake isn't, we shouldn't use SQL. The mistake is that SQL should have been built on top of a programmatic design. Okay. So that's the kind of thing he's talking about. He's like, it's, it might be a little hard to use these parentheses, but if you're generating programs, meaning you're writing, you're trying to access this from a computer. You're writing a program to write programs. It is really nice. And it's, I agree. Despite the languages clumsiness, many frontier workers consider it to have outstanding expressive power. Nearly all work on procedures that solve problems by building and modifying hypotheses have been written in this or related languages. Unfortunately, language designers are generally unfamiliar with this area and tend to dismiss it as a specialized body of symbol manipulation techniques. And I think that that's still true today. People don't recognize the power of being able to manipulate code directly in your programming language. Okay. Next section next subsection 2.2 efficiency and understanding programs. What is a compiler for? The usual answers resemble to translate from one language to another or to take a description of an algorithm and assemble it into a program filling in many small details. For the future, a more ambitious view is required. Most compilers will be systems that produce an algorithm, given a description of its effect. Okay. So this is the future. This is the ambition. And this is what he's talking about about describing the sort of, what would you, the desired outcome you want in a quote, more declarative way than describing the steps to achieve that. All right. So this is one of those things that sounds great in principle. But when I really sit down and try to think, what would a language look like and what kinds of things would I want to describe in this language? You know, I don't really come up with much. So I hope that he can provide a little bit more insight into what he's talking about because I certainly can't. You know, he gives a couple of examples, like a graphics language. So something like graph viz, I'm imagining, where you just kind of, you know, graph viz for drawing, not graphs like histograms and line graphs, but graphs like nodes and edges and laying those out. Graph viz has a language. It's very declarative. You say is connected to B and B is connected to C and C is connected to D and it will draw it. Okay. And it's, it's good. Like you just change it up. You move the things around and like it just figures it out and lays it out and, you know, it tries to minimize the arrows that cross and all that stuff. So it's good at this small domain, right? Okay. But other than that kind of thing, where it's like a very small domain, I, you know, have trouble imagining. Okay. But here's going to give a good example. The real payoff is an analysis of the computational content of the algorithm itself, rather than the way the programmer wrote it down. I'll read it again. The real payoff is an analysis of the computational content of the algorithm itself, rather than the way the programmer wrote it down. Okay. So today, if I write an algorithm, it is not going to analyze, like, what is this really trying to do? What is this programmer trying to achieve here? It is going to compile it, maybe apply some optimizations, and that's it. And it's just going to do the stupid thing I said to do. Okay, whatever mistakes I make, it's just going to encode those into the program, right? All right. But what would it look like? Otherwise. So he describes, you know, the kind of naive Fibonacci that if you've, and you probably have, you've done this before where it's recursive, but it branches the recursion branches, like, a lot. And it duplicates work, right, where it calculates Fib of one, like, you know, exponentially many times. And then he describes how to, you could write this, any programmer would recognize that you could just write this as a loop, right? It's not a hard loop to write, and it's super fast, and doesn't require all this recursion. Any programmer will soon think of these solutions once he or she has sees what happens in the branching evaluation. If you run it one time, Fib of 10, and it takes forever, Fib of 100, never finished, you're going to be like, what's going on? And you see, ah, too much recursion. Let me rewrite it as a loop. Today's compilers don't recognize even simple cases of such transformations. Although the reduction in exponential order outweighs any possible gains in local optimization of code. So we've got these, you know, people compilers, people optimizers that are like, oh, I can swap out this addition for a multiplication and get like one cycle faster, like, yeah, sure, except it's still exponential. So, you know, any gains you make are totally dwarfed by, you know, just increasing N by one. He's saying that like, why doesn't our compiler do this for us, and just translate this really poorly, you know, that's like the mathematical definition of Fibonacci, just translated into code. Why doesn't it translate that into a more efficient solution? To be sure, a systematic theory of such transformations is difficult. A system will have to be pretty smart to detect which transformations are relevant and when it pays to use them. Since the programmer already knows his or her intent, the problem would often be easier if the proposed algorithm is accompanied or even replaced by a suitable goal declaration expression. To move in this direction, we need a body of knowledge about analyzing and synthesizing programs. Okay, so here is where I think it ties back in with that first section on theory of computation. So, he's saying like, okay, you write some random code, some random procedure, and the system wants to understand it. That's actually really hard. So why don't we just have the programmer instead of writing the procedure, just say what they want done? Like, what result do they want? Okay, now that, to me, opens up a whole can of worms about how do you encode, desire, and intent, and all that stuff, but he thinks that that's actually easier than getting a program to understand what you want. And this ties back into the theory of computation because a program, you know, a compiler, a smart compiler, would have to be, would have to have a model of computation and what are the relevant trade-offs that are worth making to optimize your program. There is no practical consequence to the fact that the program reduction problem is recursively unsolvable in general. Okay, so this is often the argument that is made that, oh no, that's like your problem is equivalent to the halting problem. Understanding an arbitrary program and determining what it does and whether it's equivalent to some other program and, you know, what the program intended it for it to do, that's, that is, you know, equivalent to the halting problem, so it's impossible. That's true. Okay, that's very true. It is impossible in general. But people can do a pretty good job at it, right? Like we can recognize this Fibonacci program as very inefficient and find the more efficient implementation that uses a loop. So if we can do it, why can't the computer? Okay, so you don't have to solve it in general. You just have to be, you just have to provide some value, some help to the programmer, right? You don't have to recognize every possible program. You just have to be able to, I mean, almost pattern match. And, you know, I think that there's a recognition of this. There's a lot of static analysis systems that work on this principle. Like, if we can just recognize that you forgot to increment your loop counter, so you have an infinite loop. Like we don't have to, like, do a total analysis of your program and find every bug. We just have to help you find, like, a case where you didn't check the array bounds. You know, that's actually a much easier problem to solve. And, again, you can't solve it in a general case, but you can solve it heuristically. And that's how people do it. Okay, in any case, one would expect programs eventually to go far beyond human ability in this activity and make use of a large body of program transformations in formally purified forms. These will not be easy to apply directly. Instead, one can expect the development to follow the lines we have seen in symbolic integration. E.G. as in slagal and moses. Okay, so this was work done in the artificial intelligence field, subfield, of finding a symbolic integration. Like, how do you simplify these giant expressions and integrate them symbolically instead of numerically? First, a set of simple formal transformations that correspond to the elementary entries of a table of integrals was developed. On top of these, slagal built a set of heuristic techniques for the algebraic and analytic transformation of a practical problem into these already understood elements. Okay, and then they had some heuristic search strategy. Okay, a heuristic compiler system will eventually need much more general knowledge and common sense than did the symbolic integration systems. For its goal is more like making a whole mathematician than a specialized integrator. Integration is actually, you know, once it was solved, they recognize it as a pretty confined problem. It's a constrained problem. It's easier to solve. Whereas the kinds of stuff that programmers do in a programming language is probably bigger than just symbolic integration. He's also hinting at something, a future, Nobel laureate, Herb Simon, about this heuristic idea that, you know, this idea, Herb Simon's idea is satisficing. Like, you just need to get something good enough to be useful. Doesn't actually have to be, like, theoretically perfect. And we can't hide behind these impossibility theorems. 2.3. Describing programming systems. One should remember that in describing a language, the main goal is to explain how to write programs in it and what such programs mean. The main goal isn't to describe the syntax. I wonder who he's talking to there. It just seems that there's like a lot of frustration there. It's not about the syntax. It's about meaning. It's about, you know, the practical ability for a programmer to write programs in the language. It's not to have this nice description of the syntax. To describe an unambiguous syntax corresponds to designing a mathematical system in which each theorem has exactly one proof. For many purposes, ambiguity is a pseudo problem. Not a real problem. If we view a program as a process, we can remember that our most powerful process describing tools are programs themselves, and they are inherently unambiguous. So I don't know if I follow. I don't know why this is important. I don't know why. Why a program written in an ambiguous language is inherently unambiguous. So I don't really know what he's getting at here. But I think that he's just trying to take the focus off of the syntax and this formal field of sort of language classification and stuff. When we could just write a program that is an interpreter for the language we want and doesn't need to be perfectly unambiguous and all these other theoretical things, theoretical properties of it, probably not. It just needs to work. That's what I see he's saying. And sure those are interesting, but maybe we're focusing too much on that instead of on getting the computation right, the meaning right. There's no paradox in defining a programming language by a program. The procedural definition must be understood, of course. One can achieve this understanding by definitions written in another language, one that may be different, more familiar, or simpler than the one being defined. But it is often practical, convenient, and proper to use the same language. For to understand the definition, one needs to know only the working of that particular program and not all implications of all possible applications of the language. It is this particularization that makes bootstrapping possible, a point that often puzzles beginners as well as apparent authorities. He's bringing up this idea of bootstrapping. He's worried that people are really focused on grammar and BNF definitions and how to parse and stuff. This is the same idea, this form versus content. The idea of a program whose meaning is interpreted by another program is lost, this bootstrapping idea, that instead we want to statically define it and analyze it and be able to understand all aspects of it and make it totally unambiguous. And how best to do that. You're understanding this formal system of a language, which he calls is equivalent to designing a mathematical system in which each theorem has exactly one proof. Is that even important? We should be focusing on the meaning of the program and how to get things done. The last section is called learning, teaching and the new mathematics. Totally, totally like jumping to a new section here. Because it's not about learning and teaching in the academic field of computer science. It's not computer science education, it's education in general, so kids basically. Education is another area in which the computer scientist has confused form and content, but this time the confusion concerns his or her professional role. He or she perceives his or her principal function to provide machines to provide programs and machines for use in educational schemes. Well and good, but I believe he or she has a more complex responsibility to work out and communicate models of the process of education itself. Okay, just like before he's got a very strong opening paragraph that needs a lot of unpacking. So, a lot of people talk about computer science in education and even these days they're like oh you got to learn to code, coding is the future. There's a lot of jobs in coding and everyone's going to need to learn to code. Let's learn to program. Also, computers need to be part of education because people need to learn how to use computers and use the interface. And kids learn these complicated devices really easily, especially the younger they are and so let's teach them how to type and copy and paste and all this stuff. And so you see computers being used, but they're really oh and like oh and YouTube is so educational there's all this great content on the internet and kids can learn from it. But all of this is kind of missing the deeper point about computers in education and I myself so I didn't have a computer in school. We had a computer class where we learned how to type and stuff. I'm too old for that. But I saw my cousins who had laptops in grade school and they had to do their homework on their computers and I saw that this is not what computers are useful for. I could see that. And so he thinks that computer science has a lot to teach us about process itself, the process of procedures. Right, so I'm kind of jumping the gun a little bit, but when you count or when you do, you know, let's say an arithmetic problem, there's a procedure you're following. You add these two digits and you get a summoner and a carry and then you go to the next column and you summon carry and summon carry and summon carry and like so there's a procedure to it. Sometimes you get it right and sometimes you get it wrong. You mess up in the procedure, right? And it's this procedure that he's talking about computer science has something to talk about. All right, so he's also talking about this is a lot of work with Seymour Papert. That's like a cool legend about them meeting in this building where it's like this discarded building at MIT. Some old like they worked on radar there and then after the war, no one was using the building, but it was like no one cared. Like if they tacked stuff on the wall and painted and tore down walls and stuff like that, no one cared. And there's all these old discarded equipment. And so there's a lot of cool stuff to do. And like there's a lot of chance encounters. And so a lot of misfits were in this building and they had cool collaborations and this is one of them. All right, so this is Seymour Papert and Marvin Minsky's work. And of course Seymour Papert was a protégé of Piaget, or a student of Piaget, and Piaget was a huge name in education, right, in development, the mental development, psychological development. So Papert comes, I mean this is me, maybe projecting a little too much of my past knowledge, but I think Papert is the one bringing this sense of education to Minsky. Okay, so the following statements are typical of our view, so that's Papert and Minsky's view. To help people learn is to help them build, in their heads, various kinds of computational models. The student, when debugging his or her own models and procedures, should have a model of what he or she is doing and must know good debugging techniques, such as how to formulate simple but critical test cases. It will help the student to know something about computational models and programming. The idea of debugging itself is a very powerful concept. These have the sound of common sense, so that was a partial list of this list of principles that they have. These have the sound of common sense, yet they are not among the basic principles of any of the popular educational schemes. This is not because educators have ignored the possibility of mental models, but because they simply had no effective way before the beginning of work on simulation of thought processes to describe, construct, and test such ideas. Okay. Wow, there's a lot to say here, and I think I'm going to get into some of it a little bit more later, but notice this isn't about how do we write a program that can teach a kid. Right. This is not, and this isn't how do we teach programming either. It is the computer scientist has basically has access to this new field that we only now have been able to develop an understanding in, which is this, it's not math, but it's some theoretical framework about procedures, and that gives us an incredible language and set of techniques that could help education. Okay, so put another way, the advantage of learning to code is not because you can get a job as a JavaScript programmer. It is not because you can understand the computer better. It's because you will understand procedure and be able to apply endless. So, for instance, debugging and how to approach these problems and be able to apply it to your own mental processes. Okay, you get a sense of algorithm, and this idea of multiple algorithms can compute the same answer, so let me try a different algorithm. Oh, there's a different approach to procedure. What if I break it in half and have this easier sub procedure to write? You know, there's the computer science can teach us more about ourselves that we didn't know was possible. There's a story of Minsky. He actually came up with a, man, I'm going to get the numbers wrong, but he came up with a way of encoding a Turing machine that had two registers. Okay, maybe it was three, but it's very low number, and it was like the lowest number found, right? And he said that he actually can't prove it, but he thinks that that might be the only possible way to encode a Turing machine with two registers. Three is easier. A lot of people found different ways of encoding the three register Turing machine, but the two one is like, no, that might be the only one. Okay, what does that tell us about ourselves? Not how we think about what makes us different from a dog. Dog is very intelligent. A dog can, you know, learn, you know, a handful of words and can operate very well in the environment. Can collaborate with other dogs to get stuff done, can solve some very simple problems, right? But there's something different about that dog from us. Okay, so what can this teach us that maybe we have a couple more registers, right? Like maybe the dog has two registers, and so just, you know, by chance that doesn't get, because there's very, there's only one way to make a two register Turing machine. The dog cannot achieve Turing completeness, because it only has two registers, whereas we have three, and now it's much more likely for evolution to find one of the many ways of encoding a three register machine. I might be getting the numbers wrong. That's not what's important. It's, it might be like three and four instead of two and three, right? But the idea is that just having a quantitative difference of two registers to three registers makes it a hundred times more likely to develop Turing completeness gives us some notion of like how we might have evolved to have a Turing complete brain. And even the idea that we can conceive of the Turing machine as a, you know, a very theoretical model of how we can think and how we can do procedures, right? This is, this is something that computer science can teach us. Okay, and, and that was there at the beginning. Turing talked about this, right? But we've, we've, I don't know, dismissed that or, I don't know, not sure. But this idea that computer science can teach us about ourselves, this is what drew me to programming in the first place, that I've, once I started learning a program, I felt like, wow, I can actually encode my knowledge, watch it run, and realize how I'm wrong. And it's, it's, it's not just a metaphor for learning, it is an externalization of the process of learning through a medium that enables it, that makes the cycle time faster, that lets me check my work faster, check my thinking faster. So, so I just, I, that's a very personal story, but that's how I got into it. I, that's why I got a master's degree in artificial intelligence, because I felt like this is, this is a, this is a way of, of understanding humans better. Okay, he's going to go on in section three point one, and just, I mean, I think too exhaustively describing all the abilities of a five year old kid, and what they can and can't do with math. So I'm not going to read most of that section. Okay, it's actually like nine pages of this, and I don't, I don't think it's, I mean, it's, it demonstrates one very important thing, that they did spend a lot of time understanding kids, and the developmental stages that they go through, and their particular abilities at different ages. It demonstrates that, and so you do trust that they know he knows Marminski and Seymour Pappert, that they know what they're talking about. Okay. All right, so I'm just going to read a tiny bit, just so you get a taste of it. Imagine a small child of between five and six years about to enter the first grade. Our child will be able to say one, two, three, at least up to 30, but probably up to a thousand. He will know the names of some larger numbers, but will not be able to see, for example, why 10,000 is a hundred hundred. He will have serious difficulty in counting backwards unless he has recently become very interested in this. He can count four to six objects with perfect reliability, but he will not get the same count every time with 15 scattered objects. Okay, so there's just this exhaustive list of all the stuff they can do and can't do. They're very close, it seems like they can get there, but then they're not quite there yet. I'm a parent of two kids, and this sounds exactly right to me. Like, you can see, oh, this kid can do this thing, but why can't they go like one little step further? It's so obvious, but of course they're just not, there's some part of their brain is not ready for that yet. You know, the classic examples, I remember when my first daughter, she was playing with a cup, and you could just tell that she just considered it to be this object that took up space, right? And then one day, she's holding the cup. She looks at her hand and she looks inside the mouth of the cup, and she puts her hand in, takes it out, puts her hand in. So she's like making this discovery that like, oh, there's an inside and an outside, it's not just a volume. Like, there's the development precedes very discreetly. It's not a smooth thing. There's often these leaps that happen. All right, so there's this very accurate model of development. I think it talks a little bit too much about it, like unnecessarily detailed. All right, but then here's the point. Before computation, the community of ideas about the nature of thought was too feeble to support an effective theory of learning and development. It needs a substrate of already debugged theories and solutions of related but simpler problems. Now we have a flood of such ideas, well-defined and implemented for thinking about thinking. Only a fraction are represented in traditional psychology. So then there's this big list. I'll read a few. A symbol table, pure procedure, time sharing, calling sequence, functional argument, memory protection, dispatch table, error message, function call trace, breakpoint, languages, compiler, indirect address, macro, garbage collection, list structure, block structure, look ahead. So I read less than half of them. Notice they're programming concepts, and he's saying that these apply to the way we think, or at least by analogy they could. There might be some relationship between these things and how we think. And an educator who knew these things might be able to say, "Ah, wouldn't it be nice if we had a symbol table?" Or, you know, we did some kind of... I don't know, but I'm just trying to find one real quick. What if you had a decision tree? Right? So that could help you think through this problem. You know, solve this problem that you couldn't solve before. Okay, I'm going to skip some more. I'm going to skip some more. Okay. The idea of a procedure and the know-how that comes from learning how to test, modify and adapt procedures can transfer to many of the child's other activities. Functions are easy to grasp as procedures, hard if imagined as ordered pairs. If you want a graph, describe a machine that draws the graph. If you have a graph, describe a machine that can read it to find the values of the function. Both are easy and useful concepts. So notice he's showing how we can use this concept of a machine, a procedure that performs the function as a way of teaching functions. As an educational device, this is called scaffolding, where the teacher can guide the student into a skill by using analogies and stuff. It's one of the ways that we scaffold as teachers. Now, as a teacher myself, I taught kids, not five and six, but I did teach math. I would love to have this stuff to be able to show a student how to debug their own thought process. In fact, that was one of my most useful techniques, the ones that worked the best, was to give the student a way to check their own answer so that they could do a simple problem, check the answer, realize it's wrong, and try again. Where did I go wrong? What happened? We'll try it again. Why don't you list the steps you're doing? There's all these procedural educational devices that computer science has the vocabulary about. I'm extrapolating here. He's proposing that computer science is more than about the machine and getting the machine to do useful stuff. It is a new theory of computation and procedure for achieving computation. And I think that that's the thread that flows through this, that we should see ourselves more as theoreticians. Now, this stuff really excites me. Herb Simon, who we will see soon, his idea was that computer science is actually an empirical field, a new empirical method for understanding the world. So, there's this really cool stuff that's going on, and people get Nobel, he got Nobel Prize, but people get these Turing awards for it. And then, what happens? What happens? Why don't we have more exploration of this stuff acknowledged by more work in this theoretical, cool area? I don't know. Maybe there's no money in it. You still need money when you're a researcher. The computer scientist, Dust, has a responsibility to education, not as he or she thinks, because he or she will have to program the teaching machines. Certainly not because they are a skilled user of finite mathematics. They know how to debug programs. They must tell the educators how to help the children to debug their own problem-solving processes. The computer scientist is the one who must study such matters, because they are the proprietor of the concept of procedure. The secret educators have so long been seeking. So, I mean, I did not know this before reading this, that he was going to talk about education so much. But this is something I think is really important, and it's been missing from the learned-to-code movement, the code for America and all that. Is that what it's called? Code for America. No, the one that was trying to get, like, Obama was coding and learning to code. Programming, just like reading and writing, had a huge effect on education. Like, more so than just like, "Oh, now I can read books." But now it changes the discourse. It changes because we have reading and writing, and the printing press. Now, we can publish ideas faster, more broadly, and have a discussion about those ideas, right, in public that a larger group of people can participate in the discourse. And I think that programming can have a similar effect. Alan Kay gets into this, that, you know, programming, teaching kids to program so that they can learn about the world. It's not just about, like, accessing the media, accessing, you know, YouTube videos about gravity so that you learn gravity. That's not what computers and education is about. It's about writing a program to simulate gravity and seeing if it works. Seeing if it actually looks, you know, the ball that you drop in the simulation looks like the ball that you drop in the real world. And now you understand, you have a better understanding of that dropping, right? And see more Pappert's work. It's all about this, about being able to construct these intricate worlds, like a different reality, and you program in that reality, and you get to discover things about that world. And then it turns out that that reality was teaching you about symmetry, and this reality was teaching you about superstring theory, and, you know, et cetera. All right, I'm rambling on now. I'm just so excited that he went here. He went to this space of saying that, like, computer science has something fundamental to teach us that is not just about the computer and, you know, getting a bank to be more efficient because it's electronic instead of on paper. Like, I really think there's a fundamental view of the world that this is revolutionized. And artificial intelligence has also revolutionized psychology and neuroscience and all that. We have a much better understanding of ourselves because of these fields, because of computer science, because of trying to encode it, trying to program, you know, for instance, a chess-playing person, or chess-playing computer, and then trying to program a computer that could recognize blocks visually and a computer that can manipulate a hand. We learn so much about ourselves from that. And he's simply saying, "We need more fundamental understanding." And how do you get that? By finding these hard problems that we don't know how to solve, attempting to solve them, saying what we know about them, about how hard they are, and what has been tried before, and trying to cobble together solutions. And through that, we will be able to develop, you know, what he calls like these conservation laws and etc. Okay. That's enough rambling. That's enough ranting. If you like what you heard here, I'm going over all of the touring award winners starting. I've already done a few, but now I'm going back and starting from the first, and I'm just going to work my way forward. I'll skip the ones I've already done before. And this is a way of, you know, personally understanding the field better. I think we don't have a good enough understanding of history. But also, I figured people aren't reading these papers enough, and if you're not reading them, I'm not going to convince you to read them, but maybe you'll listen to a podcast about them. So that's what I'm doing. If you want to subscribe, please do, because that way you'll get them, you know, downloaded to your device and you can listen in the car, listen on your bike, or however you listen to podcasts. You can find me, find the podcast at listcast.com/podcast. You'll find all the links to subscribe in different ways. You'll also find me on social media, especially email. That's the one I use primarily. So get in touch with me. If you have any questions, comments, etc. Love to get into discussions about this stuff. So I'll sign off. My name is Eric Normand. This has been my thought on functional programming. And as always, rock on.