Computer Science as Empirical Inquiry: Symbols and Search

In this episode, I excerpt from and comment on Allen Newell's and Herbert Simon's 1975 ACM Turing Award Lecture.

Transcript

Eric Normand: A.M. Turing concluded his famous paper on "Computing Machinery and Intelligence" with the words, "We can only see a short distance ahead, but we can see plenty there that needs to be done." Many of the things Turing saw in 1950 that needed to be done, have been done, but the agenda is as full as ever.

Perhaps we read too much into his simple statement above, but we like to think that, in it, Turing recognized the fundamental truth, that all computer scientists instinctively know, for all physical symbol systems condemned as we are two serial search of the problem environment. The critical question is always, what to do next?

Hello, my name is Eric Normand. Welcome to my podcast. Today, I am reading from the ACM Turing Award lecture from 1975. Alan Newell and Herbert Simon were awarded it jointly. Their Turing award lecture is called "Computer Science as Empirical Inquiry -- Symbols and Search."

As you may know, if you've listened before, I like to read from the biography. Of course, there's two biographies, because there are two different recipients of the 1975 ACM Turing Award. I'll read the first one, Allen Newell. He's actually the younger of the two. Well, I'll just get in there.

He was born in 1927. This award was given in 1975. He was 48 years old when he received it. He co-received it for making basic contributions to artificial intelligence, the psychology of human cognition, and list processing.

I think when they use the term basic here, basic contributions, I think that's a good thing. Not basic, like simple and easy, but basic like fundamental, which is true. They were part of the early crowd of the artificial intelligence group. OK, I'll read some more stuff from his biography. You should definitely read the whole thing, but I'll just read and comment on the parts that stood out to me.

"Newell is chiefly remembered for his important contributions to artificial intelligence research, his use of computer simulations in psychology, and his inexhaustible, infectious energy. His central goal was to understand the cognitive architecture of the human mind and how it enabled humans to solve problems.

"For Newell, the goal was to make the computer into an effective tool for simulating human problem-solving. A computer program that solved a problem in a way that humans did not or could not was not terribly interesting to him, even if it's solved that problem better than humans did."

This is an interesting statement, especially compared to the current approach, the current dominant paradigm in artificial intelligence and machine learning these days, which is deep neural networks. The way they learn is just through millions, if not billions of data points, and it's certainly not how we learn.

It's interesting that at the beginning, there were people there really searching for how we learned. AI was almost a mirror. It was a tool for learning about ourselves. I find that really interesting. That's always what drew me to AI. When I was still in school, that's what I found interesting, that it was telling me about myself.

"As part of this work on cognitive simulation, Newell, Simon and Shaw developed the first list-processing language, IPL, which, according to Simon, introduced many ideas that have become fundamental for computer science in general, including lists, associations, schemas, also known as frames, dynamic memory allocation, data types, recursion, associative retrieval, functions as arguments, and generators, also known as streams.

"John McCarthy's LISP, which became the standard language in the AI community after its development in 1958, incorporated these basic principles of IPL in a language with an improved syntax and a garbage collector that recovered unused memory."

I might have it in a quote somewhere else. IPL was more like an assembly language, but it did have all these other features, like a list of data structure. You could allocate memory dynamically, have other data types. That's really interesting.

The other recipient, Herbert Alexander Simon, also known as Herb Simon, his birthday was in 1916, makes him 58 at the time of reception. Did I do that right, that math? 59? Yeah, 59. He was his professor. He was Allen Newell's professor for a long time.

Apparently, it says that they were equals. It was just that Herb Simon was the older and already had a position. He had Newell come as a grad student so that they could work together, but they actually were peers.

The citation, as they call it, is the same, so I'll read from his biography. Pretty interesting character. "The human mind was central to all of Simon's work, whether in political science, economics, psychology, or computer science.

"Indeed, to Simon, computer science was psychology by other means. Simon's remarkable contributions to computer science flowed from his desire to make the computer an effective tool for simulating human problem-solving."

It's very similar to Alan Newell's goal of simulating the way humans think, not just solving the same kinds of problems that humans can solve. It has to do it in the same way.

The other thing is it talks about his work in economics. He won the Nobel Prize in economics. Herb Simon coined the term satisficing and it was this breakthrough idea in economics that you couldn't optimize. You could only satisfice.

There's too much information for a person to process and choose the perfect solution, to satisfying their needs and doing a cost-benefit analysis. You have to use heuristics to make progress in the world. Often, what people do is choose whatever works. The first thing that comes to mind that might work and not optimizing.

He won the Nobel Prize for that. It seems like common sense today, but that's his influence on us. That's how we think these days.

"In addition to employing principles of heuristic problem solving, the Logic Theorist was an error-controlled feedback machine that compared the goal state with the current state and formed one of a small set of basic operations in order to reduce the difference between the two states. The Logic Theorist was a remarkable success.

"Simon Newell and Shaw elaborated on its basic principles in creating another renowned program, the General Problem Solver or GPS, in 1957 and '58. The GPS was not quite as universal as its name implied, but it was startlingly good at solving certain kinds of well-defined problems. Even more GPS, like LT, appeared to solve them in much the same ways that humans did."

This does go back -- we're turning 1957, 1958 -- to a different paradigm of artificial intelligence where they were much more closely tied to, say, the psychology department. They were doing psychological experiments to understand how people solve problems.

They would give the person a problem. They would ask them to solve it and talk. They would train them to talk about what they were thinking about at the time. They would try to figure out what steps did they take to get to the solution, and how do we generalize that?

He, Simon, said that, "We need a less God-like, and more rat-like, picture of the chooser." LT and GPS were intended to create just such rat-like models of how people actually solve problems in the real world. Rat-like meaning just mechanical and animal, and not some all-knowing, all-seeing entity.

He was a strong, even fierce advocate of the computer program as the best formalism for psychological theories, holding that the program is the theory. The fullest statement of this belief was the monumental text, "Human Problem Solving," authored by Simon and Newell in 1972, in which they introduced the notion of a program as a set of production systems or if-then statements.

Here again, we see that he was this proponent of the idea that the best way to understand a person or psychology -- people in general -- is through computer programs because you can actually formalize the thoughts, the thought processes in a way that other theories of mind do not.

He talks about this a little bit in the lecture about how behaviorism and Gestalt's theory and all these other theories are so vague. You can't really use them to make predictions or anything. You need some mechanism, something, and a computer is a good simulation of that.

The flip side of this coin was his insistence that computer simulation was an empirical science that taught us new and valuable things about ourselves and our world. Simulation was not an exercise in elaborating tautologies. This is what the main topic of the talk is, computer science as empirical inquiry, that computer science is a science. It's a kind of science.

We'll get more into it in the paper. Last, but not least, Simon believed that organization and structure were critical. What his computer simulations simulated was not the actual physical operations of neurons in the brain, but rather the structure of problem-solving processes.

The computer program thus could be a structural model of the mind in action, not a model of its specific physical make-up. Two of the key conclusions he drew about the structure of our human mental processes are that they are hierarchical and that they are associative.

He believed that they have a tree structure with each node/leaf linked to a branch above it. Each leaf could either be one thing or a set of things, a list of things to be precise.

Since items on a list could call items on other lists, this model of the mind could work associatively within its basic hierarchical structure, creating webs of association amongst the branches of the mind's tree. He doesn't go too much into that in the talk.

It's been a while since I've done one of these lectures. I actually printed this out months ago and have been working through it. I've read it several times. I think it's well worth it.

Herb Simon is the only person, I think, to have won both a Nobel Prize and a Turing Award. He's kind of a big deal. He is an important person that we should recognize and, "Oh, I have it on myself."

He wrote a book that's called "Sciences of the Artificial." Also, definitely worth a read. Not very long. But as Alan Kay said, "He won a Nobel Prize and a Turing Award. Read his book. Come on. Why not?" He's an important figure and I think the big themes in this lecture are the topic of the book. I'll probably have some comments that I remember reading in the book.

I didn't re-read the book for this. I probably should have, but it was already getting too long so I didn't. The break was getting too long so I didn't have a chance to read it again.

Let's get into it. Just some information before I start. I don't read the whole thing. It's actually pretty long. It's 13 pages. I just pick out phrases and sentences, maybe whole paragraphs that I think are worth commenting on, that I have something to say about.

Another thing is this gets pretty wordy. He's not a concise writer. There's a lot of lilt to what he says, a lot of intellectual flourishes of his speech. I don't know who's actually doing the writing here, but it seems like someone who's used to being professorial.

I did my best in some places to skip words and sometimes whole phrases because they didn't really add anything just because it can...It is going to get long. It's already going to be too long. I did my best let me put it that way.

Some people are super succinct. You can find one sentence that says it all, and I just have to read that. Other people are like, "OK." Like, "The point starts at the top here, and now, I'm at the bottom and I see the other half of the point." He made this long thing in between, like, "Can I skip it?" It's hard to figure out. OK, so let's just start at the beginning.

"The machine -- not just the hardware, but the programmed living machine -- is the organism we study."

He's still in the first paragraph here. He's in the introduction. When most people who have won the award start, they often referred to the past lectures, trying to put some spin, some perspective on what they can tell us. I'm going to skip the part where he explicitly talks about the other lectures, but he's contrasting his view with the other views.

For instance, Dijkstra had that famous quote in his Turing lecture that, "Computer science is not about the machines, it's about the programs." Just like astronomy is not about telescopes, computer science is not about computers. Something like that.

Well, Simon and Allen Newell, they're saying that it's the running machine with software on it, the behavior of the machine, which I find is very different from the people who have come before. It's not about the software, the abstractions. It's not about the hardware and how you do stuff efficiently.

We've seen people talk about that, but their view is that it's the running, living programmed hardware. I love that, because it's a totally new view from his book where he really talks about how it's a new thing to study.

There's properties of it that we don't understand and we can empirically determine. For instance, how do search algorithms work, and things like that. Those are the kinds of things we can empirically study by running programs on computers.

Here he goes into the main topic. "Computer science is an empirical discipline. Each new machine that is built is an experiment. Actually constructing the machine poses a question to nature, and we listen for the answer by observing the machine in operation and analyzing it by all analytical and measurement means available."

He continues, "Each new program that is built is an experiment. It poses a question to nature and its behavior offers clues to an answer. Neither machines nor programs are black boxes. They are artifacts that have been designed, both hardware and software, and we can open them up and look inside."

Here, he's laying out the whole thesis that this thing that we've created, these machines with software that runs on them, they're like a whole new world that we can do experiments on.

"As basic scientists, we build machines and programs as a way of discovering new phenomena and analyzing phenomena we already know about. Society often becomes confused about this believing that computers and programs are to be constructed only for the economic use that can be made of them.

"It needs to understand that the phenomena surrounding computers are deep and obscure, requiring much experimentation to assess their nature. It needs to understand that, as in any science, the gains that accrue from such experimentation and understanding pay off in the permanent acquisition of new techniques.

"And that, it is these techniques that will create the instruments to help society in achieving its goals. Our purpose here, however, is not to plead for understanding from an outside world. It is to examine one aspect of our science, the development of new basic understanding by empirical inquiry."

Let's talk a little bit about empiricism and computer science. I actually read two books at the same time. I read Herb Simon's, "Sciences of the Artificial" and another book called "Algorithms to Live By." I felt that they complemented each other. Algorithms to Live By gave some good examples of phenomena that computer science elucidate. One of them was why a bigger load of laundry takes longer than a smaller load.

It's quite simple, that if you sort your clothes, sorting is...It's bigger than order N. It's like order N log N are the best. The bigger your load, the longer it takes. It's not linearly, it's more than linear like growing.

Smaller batches of clothing should be faster to do. Of course, we probably don't sort our clothes anymore so much like we used to. You get the idea that this can teach us why certain things in our everyday lives take longer. Why is it easy to sort a small set of, let's say, playing cards versus the whole set?

The small set, you can see the whole thing and keep it all in your head, and boom, right in order. If there's 52 of them, you can't see them all. You're moving around a lot. It takes a long time.

I think that that's the thing that he's talking about. It might be a very simple basic example. If you extrapolate that too, now, let's see if we can explain what's happening in a human mind by writing a program that does the same thing.

We can understand that program and what it's doing. By analogy, what must be happening in that mind, and why certain things work better in the program, does that match up with what we see in the mind? It's a probe into complex systems that are hard to understand by themselves, that we have this tool of simulation that we couldn't have before.

Another thing is he's going to go deeper into science and also into artificial intelligence. This is maybe one of the best short summaries of artificial intelligence at that time that I've ever read, so we'll get into that too.

Finally, I just want to say that he's talking about...He's going to examine one aspect of our science, the development of new basic understanding by empirical inquiry. He's going to give some examples, some illustrations. These are from their own work. They were big in artificial intelligence so a lot of them are going to be artificial intelligence examples.

Time permits taking up just two examples. The first is the development of the notion of a symbolic system. The second is the development of the notion of heuristic search.

Both conceptions have deep significance for understanding how information is processed and how intelligence is achieved, however, they do not come close to exhausting the full scope of artificial intelligence, though they seem to us to be useful for exhibiting the nature of fundamental knowledge in this part of computer science.

They're two examples that they're going to give, symbolic systems and heuristic search.

"One, symbols and physical symbol systems. One of the fundamental contributions to knowledge of computer science has been to explain, at a rather basic level, what symbols are. This explanation is a scientific proposition about nature. It is empirically derived with a long and gradual development."

This is a big, mysterious statement. It took me a long time to get this. Symbols, I think he's referring to symbols as in not symbols in general, but symbols as in LISP symbols, but that he claims that they're...they claim, sorry, that they're very...the same kinds of things happening in our minds, and we'll get to that.

A LISP symbol is just a string, a different type, but it's just a string of characters, and it can represent something. We'll get to that in a second. The important thing is that it's empirically derived. They've done a certain number of experiments that got to that point, that it wasn't just arbitrary. It wasn't like, I don't know, "We just invented this thing and it works."

"Symbols lie at the root of intelligent action, which is, of course, the primary topic of artificial intelligence. For that matter, it is a primary question for all computer science. For all information is processed by computers in the service of ends, and we measure the intelligence of a system by its ability to achieve stated ends in the face of variations, difficulties, and complexities posed by the task environment.

"This general investment in computer science in attaining intelligence is obscured when the task being accomplished are limited in scope, for then the full variation in the environment can be accurately foreseen.

"It becomes more obvious as we extend computers to more global, complex, and knowledge-intensive tasks, as we attempt to make them our own agents, capable of handling on their own the full contingencies of the natural world."

He's just saying that, sometimes, you have these simple examples, and they're so small that it doesn't look intelligent. You can see that it's just obviously a little mechanism. As you have to deal with unforeseen things, you need more intelligence. The decisions have to be smarter. Those are the kinds of systems that require a symbol system.

"Our understanding of the system's requirements for intelligent action emerges slowly. It is composite, for no single elementary thing accounts for intelligence in all its manifestations. There is no 'intelligence principle,' just as there is no 'vital principle' that conveys by its very nature the essence of life.

"But the lack of a simple deus ex machina does not imply that there are no structural requirements for intelligence. One such requirement is the ability to store and manipulate symbols."

He's trying to make an analogy. They are trying to make an analogy -- sorry, I keep doing that -- that there's no one thing that makes something alive. There's no one thing that makes something intelligent. Some things are required. You can't just throw it all away and say, "There's no way to describe it." There's a requirement, and that is a store and manipulate symbols.

Now, he's going to go on a little detour and talk about science, because he has to explain his little background.

"All sciences characterize the essential nature of the systems they study. These characterizations are invariably qualitative in nature, for they set the terms within which more detailed knowledge can be developed."

They define the terms qualitatively, and this lays the groundwork for what you can talk about in that field. "A good example of a law of qualitative structure is the cell doctrine in biology, which states that the basic building block of all living organisms is the cell. The impact of this law on biology has been tremendous, and the lost motion in the field prior to its gradual acceptance was considerable."

This is a really interesting statement here. This was 1975, remember. I learned this cell doctrine, cell theory in biology class in high school, that all living things are made of cells.

There's like three pieces to it, but it's basically that everything is made of cells, cells are really small, and nothing is smaller than a cell, something like that. The first thing that people nowadays say when you bring up this cell theory is, "What about viruses? Aren't viruses alive? They have DNA. They multiply. They're alive."

We have a different view today that doesn't always mesh with the cell theory. What I think is difficult is that, in science, you can have a theory that's not 100 percent right, and it's still productive. It's still useful.

He says that the impact of this law on biology has been tremendous because, before that, we had no idea how the body worked. Once you could break stuff up into cells, you can start doing cell physiology and understand, what's happening inside one cell, that that's an important thing to understand. It really opened up the field.

Does that mean that they didn't include viruses as alive? Yes, they did not. That's fine. There is still a lot to learn with this frame. I think that it takes a lot of science education to realize that these theories are just mental toys.

They are models that include and exclude parts of the phenomena. They're just useful for thinking. They're useful for framing a problem, for deciding what to work on next, things like that.

I just wanted to talk about that because he brought up cell doctrine, and I thought it's a good chance to talk about that aspect of science, especially when we're now 50 years, almost 50 years after this in a totally two paradigms later in artificial intelligence. It doesn't invalidate what they were doing, that there's now a different dominant paradigm.

"The theory of plate tectonics asserts that the surface of the globe is a collection of huge plates -- a few dozen in all -- which move against, over, and under each other into the center of the Earth, where they lose their identity."

Just another example of a big theory that's qualitative and foundational, and once you have that, it puts a totally different picture, it paints a totally different picture of what's going on, in this case, on the Earth with the continents.

"It is little more than a century since Pasteur enunciated the germ theory of disease. The theory proposes that most diseases are caused by the presence and multiplication in the body of tiny, single-celled, living organisms, and that contagion consists in the transmission of these organisms from one host to another."

Another example. Another one, the doctrine of atomism offers an interesting contrast to the three laws of qualitative structure we have just described.

"The elements are composed of small, uniform particles differing from one element to another, but because the underlying species of atoms are so simple and limited in their variety, quantitative theories were soon formulated, which assimilated all the general structure in the original qualitative hypothesis."

OK, just talking about atoms and elements, how this leads to chemistry. Now, he's going to conclude about this. "Laws of qualitative structure are seen everywhere in science. Some of our greatest scientific discoveries are to be found among them. As the examples illustrate, they often set the terms on which a whole science operates."

He's going to lay out two of those for computer science. Artificial intelligence, really.

"Let us return to the topic of symbols, and define a physical symbol system. A physical symbol system consists of a set of entities, called symbols, which are physical patterns that can occur as components of another type of entity called an expression or symbol structure. Thus, a symbol structure is composed of a number of instances of symbols related in some physical way such as one token being next to another.

"At any instance of time, the system will contain a collection of these symbol structures. The system also contains a collection of processes that operate on expressions to produce other expressions, processes of creation, modification, reproduction and destruction.

"A physical symbol system is a machine that produces through time an evolving collection of symbol structures." This is a physical symbol system. Over time, there is a collection of expressions that changes over time.

"Two notions are central to the structure of expression, symbols and objects, designation and interpretation. Designation. An expression designates an object if given the expression, the system can either affect the object itself or behave in ways dependent on the object."

It says affect, affect. I think they might mean effect can have an effect on the object itself. Either way, what a symbol can designate something, meaning it can refer or represent something if the system can change it or behave in ways dependent on it.

Like if you say, "That cat," the system can maybe poke the cat, right? It somehow knows that that symbol for cat, that expression that cat can also be poked with a rod that's affecting it. Then, if you can behave in ways depending on the object so if the cat moves, maybe you can...your expression about what about that cat changes.

Interpretation. This is the second thing. T"he system can interpret an expression if the expression designates a process, and if given the expression, the system can carry out the process." Now, expressions can designate a process and he's calling this interpretation when you run it.

We get a lot of lisp vibes from this that you can make expressions out of symbols and then run them.

Now, he's got this hypothesis. "The physical symbol system hypothesis. A physical symbol system has the necessary and sufficient means for general intelligent action. By necessary we mean that any system that exhibits general intelligence will prove upon analysis to be a physical symbol system.

"By sufficient, we mean that any physical symbol system of sufficient size can be organized further to exhibit general intelligence. By general intelligent action, we wish to indicate the same scope of intelligence as we see in human action."

I don't want to get into the debates about whether general intelligence is possible a good term. General is just...It accepts too much, right? They do define it as saying it's the same scope that we see in human action. The ability to deal with a lot of nuance and variability in our environment and stuff.

Those things always bug me, because it's very rare...yes, humans can play the violin really well, and they can play chess really well, and they can do acrobatics, and not fall down, let's say. Yes, it's true, but it's rare that you have a single person that does all three very well. [laughs] It's not fair that you expect the same machine to do all three.

I worked in robotics, so I have a lot of those kinds of feelings, like, "It's not fair. Humans fall down all the time." I wanted to talk about this designation again. That's really where the meaning comes from where in this view, the thing that connects the symbol to the real world, to an object or something in the real world, is the ability to effect the thing in the real world, to poke the cat, or to behave differently when the object changes.

If the cat moves, maybe your head turns because you recognize the cat. It's a very mechanistic view of meaning, but I can't think of a better one. We often equate meaning. Meaning is one of what Minsky calls a suitcase word, where you just pack whatever you want in there. That meaning we often think of is like human meaning, like a lot of emotional investment in the thing.

He's excluding, let's say, the medium on which the symbol system runs. That's not quite as important as the fact that it is a symbol system. Maybe your symbol is love or your symbol is the word love or your symbols system is oxytocin, or some hormone that flows and sends a message to all cells, "Oh, this is what we're doing now."

It's a much more basic one than the word and one that exists in more animals and things, but it's serving the same purpose and perhaps, that's...I don't know. I don't know what else to say about that. We now need to trace the development of this hypothesis and look...Oh, wait, I missed something.

This is an empirical hypothesis. The hypothesis could indeed be false. We now need to trace the development of this hypothesis and look at the evidence for it. He's stating something pretty strong here.

The hypothesis itself is pretty strong, necessary and sufficient for intelligence, it's very strong, but it's falsifiable. If you could construct an intelligent system that does not have a symbol system, then it wouldn't be necessary.

If you construct one that...or if you could make a symbol system that was complex enough as he says, it's sufficient size, but that you couldn't organize into a general intelligence then also falsified. It's still a hypothesis. We haven't developed a general intelligence system.

"A physical symbol system is an instance of a universal machine. Thus the symbol system hypothesis implies that intelligence will be realized by a universal computer. However, the hypothesis goes far beyond the argument, that any computation that is realizable can be realized by a universal machine, provided that it is specified."

Universal machine, he's talking about a Turing machine. He's saying that yes, intelligence is runnable on a Turing machine. I don't know if that is still something that people argue about. I've accepted it for myself that a Turing machine could do what I do if we knew how to program it.

I think some people still hold out that there's something unique in the human mind, perhaps. I know some philosophers used to, but I'm not sure if that's the case anymore. I think people use computers all the time now and they're much more, I don't know. I don't know.

I don't think there's anything special and I think that the people in artificial intelligence at least at this time, did not think that human was special in that it violated the Turing universality.

He's saying another thing that the hypothesis goes further than just saying that the human mind is simulated will by a Turing machine. It's talking about its structure. How would it have to work, if it were simulated by a Turing machine?

"For it asserts, specifically, that the intelligent machine is a symbol system, thus making a specific architectural assertion, about the nature of intelligence systems. It is important to understand how this additional specificity arose."

OK. That's what I just said. "The roots of the hypothesis go back to the program of Frege and of Whitehead and Russell, for formalizing logic, putting the notions of proof and deduction on a secure footing. Logic, and by incorporation all of mathematics, was a game played with meaningless tokens. According to certain purely syntactic rules.

"Thus progress was first made by walking away from all that seemed relevant to meaning and human symbols. We could call this the stage of formal symbol manipulation." This does seem weird that, to understand intelligence, human intelligence better, we got rid of all meaning. [laughs]

We started talking about A's and B's, AROB, stuff like that. It's just all symbols. You would convert some English sentence that had a lot of meaning into some logical statement with just logical variables. You do some manipulation and then you convert the variables back to the things they represented in the real world.

Seems weird, right? It seems like you would want to know more about the thing, not abstract it down to a letter. But that was the first step.

"This general attitude is well reflected in the development of Information theory. It was pointed out, time and again, that Shannon had defined a system that was useful only for communication and selection, which had nothing to do with meaning."

I don't know if you know this, but some people tried to change the name from information theory because, at that time, information was thought of as something that had a lot of meaning, but in the way Shannon formulated it and correctly. It was the right way to do it.

It was just zeros and ones and it wasn't about meaning or any information like what people thought of as information at that time. Now, we think of it like that. Yeah, sure. It's just bits.

Back then it was like, "Well, that's not really information. You can't learn anything from it. It's all about the amount of entropy in the system." That's it. Makes sense. Maybe the word is wrong, but it's changed now.

First nail in the coffin, here we go.

"A Turing machine consists of two memories, an unbounded tape and a finite state control. The tape holds data. The machine has a very small set of proper operations, read, write, and scan operations, on the tape. The read operation is not a data operation but provides conditional branching to a control state as a function of the data under the read head.

"As we all know, this model contains the essentials of all computers in terms of what they can do, though other computers with different memories and operations might carry out the same computations with different requirements of space and time.

"The model of a Turing machine contains within it the notions both of what cannot be computed and of universal machines. The logicians Emil Post and Alonzo Church arrived at analogous results on undecidability and universality. In none of these systems is there, on the surface, a concept of the symbol as something that designates."

This is another step in mechanizing thought, mechanizing decisions, mechanizing all this stuff. It's now all about zeros and ones on this tape. There's nothing about something that designates. There's no meaning.

"The data are regarded as just strings of zeros and ones. Indeed, that data can be inert is essential to the reduction of computation to physical process." There's no meaning anymore in this stuff. It's all zeros and ones, and basic operations of reading and writing and moving the tape.

"What was accomplished at this stage was half the principle of interpretation, showing that a machine can be run from a description. Thus, this is the stage of automatic formal symbol manipulation." We can represent program as data. That's the second step.

"With the development of the second generation of electronic machines in the mid-'40s, came the stored program concept. This was rightfully hailed as a milestone, both conceptually and practically. Programs now can be data and can be operated on as data.

"The stored program concept embodies the second half of the interpretation principle. The part that says that the system's own data can be interpreted, but it does not yet contain the notion of designation of the physical relation that underlies meaning." That was step three. Now, the last step, I think.

"The next step taken in 1956, was list processing. The contents of the data structures were now symbols in the sense of our physical symbol system, patterns that designated that had reference.

That this was a new view was demonstrated to us many times in the early days of list processing when colleagues would ask where the data were that is, which lists finally held the collections of bits that were the content of the system. They found it strange that there were no such bits, here were only symbols that designated yet other symbol structures."

They just won this award. They're allowed to talk about their work, but they're saying that their list processing, the IPL language was a breakthrough on the way to this hypothesis. They give a linear progression of logic, then Turing machine, then stored program computer and then list processing. All four of those steps lead you to this hypothesis.

"List processing is simultaneously three things in the development of computer science. One, it is the creation of a genuine dynamic memory structure in a machine that had heretofore been perceived as having fixed structure.

"It added to our ensemble of operations those that built in modified structure in addition to those that replaced and changed content. Two, it was an early demonstration of the basic abstraction that a computer consists of a set of data types and a set of operations proper to these data types.

"Three, list processing produced a model of designation, thus defining symbol manipulation in the sense in which we use this concept in computer science today." This is where he's...or they...Sorry.

They are presenting their work as a tangent to this hypothesis that they're presenting. They're just describing IPL, that you could have this dynamic memory structure, you can allocate little linked list nodes, and build lists dynamically, and change them and make interesting structures.

That you didn't have to have a fixed set of data that was in data statements at the end of your program, like a lot of languages did. The idea that it had different data types with operations that operated on those data types, that's interesting too. Of course, this model of designation, which they've already talked about.

"The conception of list processing as an abstraction created a new world in which designation and dynamic symbolic structure were the defining characteristics. We come now to the evidence for the hypothesis, that physical symbol systems are capable of intelligent action, and that general intelligent action calls for a physical symbol system.

"The hypothesis is an empirical generalization and not a theorem." He said that so many times. "Our Central aim however is not to review the evidence in detail, but to use the example before us to illustrate the proposition that computer science is a field of empirical inquiry, hence we will only indicate what kinds of evidence there is and the general nature of the testing process."

I said before, that this lecture is like the best short summary of artificial intelligence, this paradigm of artificial intelligence, that I've ever read.

That's where it starts, right here, is it's going to be a lot of AI from now on. It's interesting because I feel like there's quite a lot more to say on his main thesis. They seem to indicate that this is necessary. Although I think it's very, very interesting, I don't think it's such a great support for what their main thesis is, that computer science is empirical.

I do think that they just wanted to show their work. They wanted to illustrate with their work. They spend, I don't know, five or six pages on AI, and a lot of it is their work. "20 years of work in AI, has seen a continuous accumulation of empirical evidence of two main varieties. The first addresses itself to the sufficiency of physical symbol systems for producing intelligence.

"The second kind of evidence addresses itself to the necessity of having a physical symbol system, wherever intelligence is exhibited. The first is generally called artificial intelligence, the second is research in cognitive psychology." He's dividing their work into two fields. One is more, how do we build these systems that are intelligent? How do we make our chess program better, that kind of thing?

The other is research into humans. How do they think, what kinds of models can we develop, that kind of thing. "The basic paradigm for the initial testing of the germ theory of disease was identify a disease, then look for the germ. An analogous paradigm has inspired much of the research in artificial intelligence.

"Identify a task domain calling for intelligence, then construct a program for a digital computer that can handle tasks in that domain.

"The easy and well-structured tasks were looked at first. Puzzles and games, operations, research problems of scheduling and allocating resources, simple induction tasks. Scores, if not, hundreds of programs of these kinds have by now been constructed, each capable of some measure of intelligent action in the appropriate domain."

This is an interesting analogy he's making that if you had a disease, look for the germ. This is more like, "If you have a problem that humans can solve, try to solve it with a computer." [laughs]

Then, of course, symbol system. This is, again, the first kind is addressing the sufficiency. Can a physical symbol system exhibit intelligent? That's sufficiency. Is it sufficient to exhibit intelligent? Then the second part, which is cognitive psychology, is the necessity. We'll look at that. He hasn't gotten to that yet.

"From the original tasks, research has extended to build systems that handle and understand natural language in a variety of ways, systems for interpreting visual scenes, systems for hand-eye coordination, systems that design, systems that write computer programs, systems for speech understanding. The list, if not is, if not endless, at least very long.

"If there are limits beyond which the hypothesis will not carry us, they have not yet become apparent. Up to the present, the rate of progress has been governed mainly by the rather modest quantity of scientific resources that have been applied, and the inevitable requirement of a substantial system building effort for each new major undertaking."

[pause]

Eric: He's just saying that it's gotten more complicated, there's a long list of programs that do somewhat intelligent stuff. Of course, we know in the future that these things are still hard. [laughs] Interpreting visual scenes is not a solved problem, hand-eye coordination, designing, writing computer programs.

These are all things that we still find are not easy to write. "They haven't been solved, but perhaps they did find little pieces that made some sense. There has been great interest in searching for mechanisms possessed of generality and for common programs performing a variety of tasks.

"This search carries the theory to a more complete characterization of the kind of symbol systems that are effective in artificial intelligence." After writing all these programs, you start seeing some patterns, right? You want to find the pieces and parts that you can put together and reuse.

"The search for generality spawned a series of programs designed to separate out general problem-solving mechanisms from the requirements of particular task domains.

"The General Problem Solver was perhaps the first of these, while among its descendants are such contemporary systems as Planner and Conniver. More and more, it becomes possible to assemble large intelligent systems in a modular way from such basic components."

Sometimes I think that with the kinds of compute resources that we have available today, if we were to actually go back and rewrite these original systems, we might actually get a lot more out of them.

I wonder, though. I wonder if that isn't like what graduate students do these days. Fire up an easy to cluster and run like a modern version of General Problem Solver on it. I think a lot of what they learned was that.

Knowledge was one of the big constraints. You would look at a thing and it would get stuck and you'd say, "Why is this not able to solve this problem?" And it turned out that, "Oh, the system needs to learn that. The system didn't know that a whale was a mammal."

We need to write that down and even then it runs a little longer, then like, "Oh, it needs to know the density of water at sea level." Let's write that in there.

"Oh, it needs to know that humans can't breathe under water." Let's write that. It becomes that we know billions of facts about the world, if you want to call them facts.

Then, once you try to solve problems, it's not really the logic [laughs] and the reasoning. It's you don't know enough, your AI doesn't know enough. There's actually a cool project called cyc, C-Y-C, that has the goal of generating and creating a database of all these facts.

"If the first burst of research simulated by germ theory consisted largely in finding the germ to go with each disease, subsequent effort turn to learning what a germ was. In artificial intelligence, an initial burst of activity aimed at building intelligent programs for a wide variety of almost randomly selected tasks, is giving way to research aimed at understanding the common mechanisms of such systems."

Looking from 40, 46 years in the future, I don't know if we got much farther. Sorry to say.

Now, he's talking about the other side of this, the part two, which is talking about whether it's necessary to have a symbol system, and using humans as the subject of study human minds. "The results of efforts to model human behavior with symbol systems are evidence for the hypothesis and search in artificial intelligence, collaborates with research and information processing psychology.

"Explanations of man's intelligent behavior in terms of symbol systems has had success over the past 20 years, to the point where information-processing theory is the leading contemporary point of view in cognitive psychology." These are broad statements but pretty strong.

The last point is that the theory of information processing is the leading view in cognitive psychology. That's fascinating. I don't know enough about cognitive psychology to be able to evaluate that. This idea that the model of computer is information processing. That's very clear, that means it's basically self-evident.

That influencing how we understand the brain and how we understand, say, the human senses and stuff as information processing is an interesting development. Another way that the computer is influencing other sciences.

"Research and information-processing psychology involves two main kinds of empirical activity. The first is the conduct of observations and experiments on human behavior in tasks requiring intelligence. The second is the programming of symbol systems to model the observed human behavior.

"The psychological observations lead to hypotheses about the symbolic processes the subjects are using, and these go into the construction of the programs. Thus, many of the ideas for the basic mechanisms of GPS were derived from careful analysis of the protocols that human subjects produced while thinking aloud during the performance of a problem-solving task."

Too many words, man. Too many words. This, I feel is a little weak that he's referring again to GPS. What's weak about it is he clearly divided the hypothesis in two, and said that AI was working on one side of the hypothesis and cognitive psychology was working on the other half.

Now, he's saying that cognitive psychology was influencing the other half, and I just lose like well...Is it really divided? I feel like it might be better to not have divided them up. I don't know if that's a meta commentary or what about his argument, but I think what's more interesting is the back and forth, right? The back and forth between cognitive psychology and artificial intelligence.

Cognitive psychologists learn something about how humans think than the AI folks write it into their programs. That leads to a better program and then that better program, you generalize it a little bit and you can tell the cognitive psychologists, "Hey, look for something like this because that seems to work really well in our programs," and then maybe they find it.

I think that's much more interesting than like the split. OK, the absence of specific competing hypotheses as to how intelligent activity might be accomplished...Oh, sorry, this is one of those ones where I skipped a lot of words. He's talking about other evidence. It's negative evidence.

"The absence of specific competing hypotheses as to how intelligent activity might be accomplished. There is a continuum of theories usually labeled behaviorism to those usually labeled Gestalt theory. Neither stands as a real competitor to the symbol system hypothesis. Neither behaviorism nor Gestalt theory has demonstrated that the explanatory mechanisms account for intelligent behavior and complex tasks.

"Neither theory has anything like the specificity of artificial programs." Basically, he's saying, you can't translate a behaviorist model of the mind into something they can run on a computer. [laughs] To even conceive of what that would look like is making me chuckle, because behaviorist treats the organism as a black box.

They're super into the idea that, well, we can't really know what's happening in your head. It's only self-reported. We're just going to pretend like it doesn't exist or even postulate that it doesn't exist, and just look at the inputs and the outputs.

Like, we are giving you a question. You answer the question right. We reward you. If you answered a question wrong, we shock you or something, punish you. Of course, yeah, but how does it happen? How do you actually come up with the right answer? It's not even on the table. I think he's absolutely right there.

"Knowing that physical symbol systems..." Oh, he's gone on to second part. I should have said that. This is Part 2, heuristic search. The last section was all about physical symbol systems and this hypothesis that it was that symbol systems are sufficient and necessary for intelligence. This is heuristic search.

He's got a different hypothesis. They have a different numbers. "Knowing that physical symbol systems provide the matrix for intelligent action does not tell us how they accomplished this. One second example of a law of qualitative structure and computer science addresses this latter question, asserting that symbol system solve problems by using the processes of heuristic search.

"This generalization, like the previous one, rests on empirical evidence, and has not been derived formally from other premises."

He keeps saying that. I think that that's part of his point. It wasn't like Euclid coming up with the definition of point in line and then deriving the rest of stuff from that. This is actually, we made a program, we ran it, we measured how good it was at chess, and then we did it again, and we did it again, and we've learned stuff from it.

This heuristic stuff, this is what he won the Nobel Prize for, that people use heuristics in their economic activity. They don't optimize or maximize. They satisfice, meaning they have some rule that they follow or a set of rules, and they pick the one that satisfies the need that they have. They're not really trying to maximize all the time.

In fact, you can't, because to maximize you'd have to be able to try out too many options, at least in your head. Simulate too many options and you would just have analysis paralysis. You would never get to actually take action.

Another thing I want to say, this is more of a personal thing, I was working as a computer scientist. A grad student and working in a lab. I was able to generate a new experiment basically every day. I'd write some programs, some code, usually just modifying an existing one during the day, I'd run it at night, and then in the morning, I'd look at the results.

I've experienced this firsthand that you do feel like you are learning, and you're able to formulate hypotheses and falsify the hypothesis by the next day. It's very, very fast compared to other sciences. My wife however was a biologist, and she was generating one data point every few weeks sometimes.

We would talk about how discouraging it was to have so little data after working for so long, whereas I was just making up data every day and throwing it away because it was not...It's like, "No, that's not the answer."

We're in a lucky situation where we have a system where we can actually operate so quickly because...Really because the system is so malleable, and the nature of the scientific discovery process is just different. You're not trying to characterize.

What made her work difficult was that it was a cell from a certain part of the brain, and so she had to get at the cell and get a good reading. It was hard, just very physically demanding to work these little scaled-down probes. It's really hard work. Man, computers, you just fire up a hundred machines if you need them these days.

Here's the heuristic search hypothesis. "The solutions to problems are represented as symbol structures, a physical symbol system exercises its intelligence in problem-solving by search -- that is, by generating and progressively modifying symbol structures until it produces a solution structure." I'm going to go through that again. "The solutions to problems are represented as symbol structures."

Remember, these are just expressions as we know them in LISP. "A physical symbol system exercises its intelligence and problem solving by search -- that is, by generating and progressively modifying symbol structures until it produces a solution structure."

Search, it's looking for the answer by taking a simple structure and generating new ones that are perhaps better than the old one, hopefully, better. "Physical symbol systems must use a heuristic search to solve problems because such systems have limited processing resources.

"Computing resources are scarce relative to the complexity of the situations with which they are confronted. The restriction will not exclude any real symbol systems in computer or human in the context of real tasks."

This is just a summary of what he won the Nobel Prize for. What he's basically saying is, any physical system is limited. Like, in the Turing machine, it had an infinite tape and you could give it infinite time to find the answer to write all the digits of something.

He's trying to say, no, it needs to be practically limited, and it doesn't matter where you put the limit, but it has to, and it's always going to be more limited than the situation in which the system finds itself.

"Since ability to solve problems is generally taken as a prime indicator that the system has intelligence, it is natural that much of the history of artificial intelligence is taken up with attempts to build and understand problem-solving systems." Makes sense.

Now, he's going to talk about problem-solving. "To state a problem is to designate one, a test for a class of symbol structures -- solutions of the problem, and two, a generator of symbol structures -- potential solutions. To solve a problem is to generate a structure using two that satisfies the test of one."

You have two pieces, a generator that's generating potential solutions, and you have a checker that is checking whether those potential solutions are actual solutions. It's a test. I called it a checker, it's a test. "A symbol system can state and solve problems because it can generate and test."

This is more structural hypothesis. When a general intelligence system, when a physical symbol system is solving problems, it must be doing something of generate and test.

"A simple test exists for noticing winning positions in chess. Good moves in chess are sought by generating various alternatives and painstakingly evaluating them with the use of approximate measures that indicate that a particular line of play is on the route to a winning position."

You're generating, let's say all possible moves. Can this guy move here? Yes. Is that a good move? And then you evaluate whether that's going to lead to a win.

"Before there can be a move generator for a problem, there must be a problem space. A space of symbol structures in which problem situations including the initial and goal situations can be represented. How they synthesize a problem space and move generators appropriate to that situation is a question that is still on the frontier of artificial intelligence research."

Framing the problem. Usually, the problem is framed by the writers of the program, and for a long time that was what you did as an artificial intelligence programmer. You're like, "What if we represented the problem this way?" And you got one percent better results.

"During the first decade or so of artificial intelligence research, the study of problem-solving was almost synonymous with the study of search processing. Consider a set of symbol structures, some small subset of which are solutions to a given problem. Suppose further, that the solutions are distributed randomly through the entire set."

You have this big space and some of them randomly distributed are the solutions to the problem. "No information exists that would enable any search generator to perform better than a random search, then no symbol system could exhibit more intelligence than any other in solving the problem." This is if it's random, right? If it's random, all you can do is just start looking, right?

Just pick up the first one, pick up the second one, pick up the third one it could be anywhere. Why does it matter which way you do it? "A condition, then, for intelligence is that the distribution of solutions be not entirely random, that the space of symbol structures exhibit at least some degree of order and pattern.

"A second condition is that pattern in the space of symbol structures be more or less detectable. A third condition is that the generator be able to behave differentially, depending on what pattern it detected. There must be information in the problem space and the symbol system must be capable of extracting and using it."

I want to summarize these again. He kind of already summarized it, but he's saying that to be able to respond intelligently, the solutions can't be random in there. There has to be some pattern and you have to be able to detect that pattern.

Your generator shouldn't just be generating like zero and then one and then two. It's got to generate something better than that because the structure is in there, and then it has to be able to...It has to be able to generate different patterns so let me say it again.

Can't be random. You have to be able to detect the pattern, and then you have to be able to act on that pattern. Generate different solutions, different potential solutions depending on the pattern you see.

Here, he's going to give an example. "Consider the problem of solving a simple algebraic equation. AX + B equals CX + D. One could use a generator that would produce numbers which could then be tested by substituting in the equation. We would not call this an intelligent generator."

We just generate all the numbers between negative a million and a positive a million. You just tested them all for...Replaced X with that number. Does it satisfy the equation? Then that would not be intelligent. It's basically just as good as randomly trying them, so it's brute force.

"Alternatively, one could use generators that would use the fact that the original equation can be modified by adding or subtracting equal quantities from both sides, or multiplying or dividing both sides by the same quantity without changing its solutions.

"We can obtain even more information to guide the generator by comparing the original expression with the form of the solution and making precisely those changes in the equation that leave its solution unchanged."

You can be smart about this generator. This generator can know something about the operations of addition, subtraction, multiplication, and division and know how you can manipulate this equation so that it doesn't change the solutions so the equation still holds. You subtract B from both sides, stuff like that.

Now, that's one part. The second part is we know what the answer needs to look like. It has to be X equals some number, right? Or something. We want X on one side of the equal sign by itself.

We can compare the solutions we're generating and generate solutions that are leading in that direction. Have we moved stuff that's not X onto this side and stuff that is X onto that side, so you can kind of have some distance calculation. How far are we from a solution and are we getting closer?

"First, each successive expression is not generated independently, but is produced by modifying one produced previously. Second, the modifications are not haphazard, but depend on two kinds of information. Information that is constant over this whole class of algebra problems and information that changes at each step."

The information that's constant is how the operations of our work algebraically, and then differences that remain between the current expression and the desired expression so that changes at each step.

"In effect, the generator incorporates some of the tests, the solution must satisfy so that expressions that don't meet these tests will never be generated." Instead of testing brute-force, we're limiting what we generate to only things that get us closer. Now, we're talking about search trees.

"The simple algebra problem may seem an unusual example of search. We're more accustomed to thinking of problem-solving search as generating lushly branching trees of partial solution possibilities, which may grow to thousands, millions..." Nowadays billions, so the thing is the tree of the algebra problem does not branch. [laughs]

You can always know what to do next. It's clear. You just subtract some stuff out first, and then you divide some stuff, and you're done. You're always just getting a little bit closer with each move. If you're looking at chess, or more complex problems than just algebra problems, you've got branching. Now, what do you do?

"One line of research into game playing programs has been centrally concerned with improving the representation of the chessboard, and the processes for making moves on it, so as to speed up search and make it possible to search larger trees. On the other hand, there is good empirical evidence that the strongest human players seldom explore trees of more than 100 branches.

"This economy has achieved, not so much by searching less deeply than do chess-playing programs, but by branching very sparsely and selectively at each node. This is only possible by having more of the selectivity built into the generator itself."

Now notice, this is just like they said in the biography. These two researchers are more concerned with making an intelligence system that acts in the same way a human would act. They're not talking about, how can we make it search better just to make it better.

They're saying, "We studying masters, chess masters, and they're not branching as much as we're branching. They must have a generator that's smarter than ours, so that's the direction we're going to go." They would much rather do that than come up with some more efficient way of testing and just brute force generate more moves.

The somewhat paradoxical-sounding conclusion is that search is a fundamental aspect of a symbol system's intelligence, but that amount of search is not a measure of the amount of intelligence being exhibited. [laughs] [indecipherable 98:18] pretty funny thing to say. It's not about like, "We need more search. The more search, the smarter the system."

As people get better at a task, like chess...As people get better at a task, they actually search less. That's part of the intelligence, that they recognize areas of the board that they don't even need to look at, or they see that it must be one of these 5 moves, not one of the 20 possible moves. They see that instantly. It reminds me of how the AlphaGo program works.

AlphaGo has two parts. It has the generator, and it has the tester. The generator is itself a deep neural net. It has been trained on lots and lots of games to recognize potential good moves. It will generate what it considers the best moves first. This is just a generator. It's just looking at the board and saying, "Maybe here. Maybe here. Maybe here. Maybe here."

Then there's a test, which looks deeply into down the tree of possible moves. If I move here, then they're going to move here. It looks down that. It does some kind of sampling. It doesn't go all the way to a win. It does sampling to limit the amount of search that has to be done. That's how it works.

Here's the thing. I've heard people say that just the generator, without refining them with a score of, how likely are we to win if we go down this branch. Just the generator is a really high-level player. It would be an amateur. If you just took the move, they said, well, that's the best one that I can see. Boom. It doesn't even look ahead. [laughs]

It just says, probably there, move there. It's not looking, it's not analyzing deeper than that. It's just looked at the board, I think there. That player would beat you, unless you're a master. I don't know who you are, but beat me, and I know how to play Go.

It's not like, it's beating me because I don't know the rules or something like that. I know the rules. Not very good, but it would beat me. I find that really amazing. That the generator has been turned really smart. Yeah.

Then there's the test which is also pretty cool. That's sampling to see, how likely are we to win if we go down this branch? It's not actually looking at all the possibilities. It's just sampling it. When the symbolic system knows enough about what to do, it's simply proceeds directly toward its goal.

Whenever its knowledge becomes inadequate, it must go through large amounts of search before it finds its way again. That's really interesting. I think that is very human. When we don't know much about a domain, we have to do a lot of trial, and error and a lot of search, basically. The more you learn, the more direct you go right to the answer.

The potential for exponential explosion warns us against depending on the brute force of computers, as a compensation for the ignorance and unselectivity of their generators. The hope is still ignited in some human brains that a computer can be found that is fast enough and that can be programmed cleverly enough to play good chess by brute force search.

This makes me think a lot about AlphaGo again. I don't remember, but the cost of just training AlphaGo, so not all the experiments that led up to how it works and all the programming time and all that, but just the training like, "Let's train this thing on like billions of games," costs millions of dollars.

At some stage we are doing what he said, right? Like training a neural net can also be model, as search. We're deferring the problem, right? The learning part is now, this brute force, basically, search and we have the resources to do it. That scares me. It scares me.

I often think that the way we're doing neural net training these days, with the amount of data points, the amount of processing time that they require, is more like rerunning. It's more on the scale of evolution than it is on learning within your lifetime.

It's like rerunning, recapitulating, like the development of the visual cortex to do vision, right? It's not saying, "Well, we have a visual cortex. We know how to make one of those. Now, let's teach it to recognize cats."

It's saying, "Let's just start with nothing. Just big neural net, and show it millions and billions of cats and other stuff, and it'll distinguish between them. The weights of those neural nets will determine the structure that is needed to do that."

That's why it takes so much time to train these things, and so many data points. It's like running through the whole Cambrian explosion. This is a metaphor, of course. It's not actually recapitulating those things.

The Cambrian explosion, a lot of it was, because the development of the eye. Now you could see, and you could see the difference between light and dark. A little patch that's light sensitive. Then stuff could start, like predators could see what they were trying to eat, so then you had to learn to see predators to get away from them.

Boom! This big explosion of different strategies for survival. That's what I see. That's why it takes that many resources and that much energy. If you just look at the energy expenditure. Of course, it's much more directed, you don't have to worry about eating, you don't have to worry about finding a mate, all that stuff is gone.

It's just play Go. [laughs] Just recognize cats. It's still on that same order of magnitude, amount of culling of the neural net shaping of it.

Of course, there's reuse of neural nets. You can train a neural net to do one task like let's say, a visual task, and then kind of chop off the last two layers and put a new two layers on it. It's already got most of the visual cortex in there, so you don't have to redo that again.

Still, it seems like we're not doing this the way he was talking about where we're looking at how people see and building that into our system. Well, I won't comment on that anymore.

He's talking a lot about AI and I have a master's degree in AI so I have opinions on this stuff. "The task of intelligence is to avert the ever-present threat of the exponential explosion of search.

"The first route is to build selectivity into the generator. The usual consequences to decrease the rate of branching not prevented entirely. Ultimate exponential explosion is not avoided, but only postponed. Hence, an intelligent system generally needs to supplement the selectivity of its solution generator with other information using techniques to guide search."

You can't prevent exponential explosion, certainly not generally, and so we need to have more information using techniques. We got to guide the search. What does that mean? Which path do we go down?

We got a branch here. 10 different things. You got to choose which one to go down. "20 years of experience with managing tree search in a variety of tasks environments has produced a small kit of general techniques, which is part of the equipment of every researcher in artificial intelligence today.

"In serial heuristic search, the basic question always is, what shall be done next? In tree search that question in turn has two components. One, from what node in the tree shall we search next? Two, what direction shall we take from that node?" I think that's pretty self-explanatory.

"The techniques we have been discussing are dedicated to the control of exponential expansion rather than its prevention. For this reason, they have been properly called weak methods. It is instructive to contrast a highly structured situation which can be formulated say as a linear programming problem.

"In solving linear programming problems, a substantial amount of computation may be required, but the search does not branch." He just wants to say that it's not really about the amount of computation. It's more about this branching.

He talks about some other...What do you call it? Other approaches besides this generate and test approach, but I'm not going to go down there. "New directions for improving the problem-solving capabilities of symbol systems can be equated with new ways of extracting and using information. At least three such ways can be identified." He's talking about future work. Like, where do we go from here?

I do want to say I think it's clear now by how much I've read about AI that this is a pretty good summary of AI, at least, the paradigm where it was about search, problem representation, generate solutions in the problem space, test whether they're a good solution. Having to search down that...How do you generate better solutions is all about search.

I think they've lost the thread of science at this point. [laughs] They're just summarizing AI. Now, they're going into future work in AI. Again, that strengthens what I'm saying. This is good evidence for that.

"New directions for improving the problem-solving capabilities of symbol systems can be equated with new ways of extracting and using information. At least three such ways can be identified.

"First, it has been noted by several investigators that information gathered in the course of tree search is usually only used locally to help make decisions at the specific node where the information was generated.

"Information about a chest position is usually used to evaluate just that position, not to evaluate other positions that may contain many of the same features. Hence, the same facts have to be rediscovered repeatedly at different nodes of the search tree."

A simple example is sometimes you have two moves that if you do them in different orders, they'll produce the same...If you do both moves, you move the knight...Well, let's make it easy. First, you move this pawn. They move their pawn. Then you move this pawn. They move their pawn. The board doesn't matter which order you move the pawns in. They're the same end state, but the way you generate the game tree, those are two different branches.

The fact that the board looks the same at those two different branches, they're not using that information. We've already looked at this board, but it was in that branch. It's hard to do.

A few exploratory efforts have been made to transport information from its context of origin to other appropriate contexts. If a weakness in a chess position can be traced back to the move that made it, then the same weakness can be expected in other positions descended from the same move.

"Another possibility. A second act of possibility for raising intelligence is to supply the symbol system with a rich body of semantic information about the task domain it is dealing with. For example, empirical research on the skill of chess masters shows that a major source of the master's skill is to recognize a large number of features on a chessboard.

"The master proposes actions appropriate to the features recognized." Just having this encyclopedic knowledge of the features like, "Oh, this shape here. I've got to move my knight to block this..." They're thinking at this high level. They're not thinking at the low level that usually the chess programs are thinking.

"The possibility of substituting recognition for search arises because a rare pattern can contain enormous amounts of information. When that structure is irregular and not subject to simple mathematical description, the knowledge of a large number of relevant patterns may be the key to intelligent behavior.

"Whether this is so, in any particular task domain is a question more easily settled by empirical investigation than by theory." This goes back to the ghost stuff where neural nets are really good at detecting these patterns and learning these patterns, so that could be more know what is talking about. Why go programs can work now?

"A third line of inquiry is concerned with the possibility that search can be reduced or avoided by selecting an appropriate problem space." This is interesting. Some problems can be solved by using a better representation, and in that new representation, the solution is way more obvious.

Maybe you could say it requires less branching. If you try to do it in the regular representation, you have a hard problem, but this other one is much easier. He gives an example where you have a checkerboard. You have 32 tiles that are a one by two rectangle. Each tile can cover two squares, right?

You have 32 of these, right? You can cover the whole board with these 32 tiles, right? They each cover two squares. There's 64 squares on a checkerboard. All right. Now, here's the problem. If you remove opposite corners of the checkerboard, can you say removed two squares?

Can you now tile it with 31 of these tiles? Can you cover it? Well, you could brute-force it, and try every single possible combination and see if you can do it, but you could also change the representation. You could say, "Wait a second. Each tile covers both a black and a white square."

There's no way to...because it's a checkerboard pattern, you don't have two black squares next to each other or two white squares next to each other. Every one by two tile has to cover a black and a white. If we remove the two opposite corners, those are always the same color. We remove two whites or we remove two blacks.

There's no way, now, because we have...If we remove the two white corners, then there's two blacks that don't have a white with them, so we can't put the tiles on. Wow, we solved the problem. We didn't have to brute-force the search. How do you find that representation?

We've avoided all this search by changing the representation. Now, he says, "Perhaps, however, in posing this problem, we are not escaping from search processes. We have simply displaced the search from a space of possible problem solutions to a space of possible representations. This is largely unexplored territory in the domain of problem-solving research."

And I said before that often in AI, it's still the human who represents the problem. Even in neural nets, you come up with this space of vectors that's supposed to represent all possible solutions. I'm going to end now by reading what I read at the beginning. Maybe it will make more sense. A.M. Turing concluded his famous paper on Computing Machinery and Intelligence with the words, "We can only see a short distance ahead, but we can see plenty there that needs to be done."

Many of the things Turing saw in 1950 that needed to be done, have been done, but the agenda is as full as ever. Perhaps, we read too much into his simple statement above, but we like to think that in it, Turing recognized the fundamental truth that all computer scientists instinctively know.

For all physical symbol systems, condemned as we are to serial search of the problem environment, the critical question is always, what to do next?

Quite a paper, well worth a few reads. It's two of the greats that, unfortunately, are easy to forget about and overlook, but, man, their work was really important for artificial intelligence.

I don't know if I've stated this before sometime on the podcast, but I feel like the artificial intelligence project, the original one, to create machines that can do things that normal humans can do, that was very important in computer science. It's often pooh-poohed as like, "Oh, what has AI ever done?"

We have to remember that even stuff like compiling a program, parsing, and compiling a Fortran program was considered artificial intelligence. It was considered automatic programming, and so we have to give them credit.

What happens is they generate all this cool stuff on the way to try to make a better chess program, and then it becomes more widely practical and leaked out into the rest of computer science. This is the joke, once it works, they stop calling it AI, and it's just computer science.

Look at the list of stuff that IPL had, data structures. It had data types and their operations. It had dynamic memory allocation. This is stuff that we take for granted these days, but they were part of the path for getting to artificial intelligence. Without that path, feel like the situation of computing today would be a lot poorer than what we have now.

Thank you, Herb Simon and Allen Newell. I do want to recommend Herb Simon's book, it's called, Sciences of the Artificial. It goes much deeper than this short lecture did. That's all I have to say.

Thank you for listening, and as always, rock on

Eric Normand: A.M. Turing concluded his famous paper on "Computing Machinery and Intelligence" with the words, "We can only see a short distance ahead, but we can see plenty there that needs to be done." Many of the things Turing saw in 1950 that needed to be done, have been done, but the agenda is as full as ever.

Perhaps we read too much into his simple statement above, but we like to think that, in it, Turing recognized the fundamental truth, that all computer scientists instinctively know, for all physical symbol systems condemned as we are two serial search of the problem environment. The critical question is always, what to do next?

Hello, my name is Eric Normand. Welcome to my podcast. Today, I am reading from the ACM Turing Award lecture from 1975. Alan Newell and Herbert Simon were awarded it jointly. Their Turing award lecture is called "Computer Science as Empirical Inquiry -- Symbols and Search."

As you may know, if you've listened before, I like to read from the biography. Of course, there's two biographies, because there are two different recipients of the 1975 ACM Turing Award. I'll read the first one, Allen Newell. He's actually the younger of the two. Well, I'll just get in there.

He was born in 1927. This award was given in 1975. He was 48 years old when he received it. He co-received it for making basic contributions to artificial intelligence, the psychology of human cognition, and list processing.

I think when they use the term basic here, basic contributions, I think that's a good thing. Not basic, like simple and easy, but basic like fundamental, which is true. They were part of the early crowd of the artificial intelligence group. OK, I'll read some more stuff from his biography. You should definitely read the whole thing, but I'll just read and comment on the parts that stood out to me.

"Newell is chiefly remembered for his important contributions to artificial intelligence research, his use of computer simulations in psychology, and his inexhaustible, infectious energy. His central goal was to understand the cognitive architecture of the human mind and how it enabled humans to solve problems.

"For Newell, the goal was to make the computer into an effective tool for simulating human problem-solving. A computer program that solved a problem in a way that humans did not or could not was not terribly interesting to him, even if it's solved that problem better than humans did."

This is an interesting statement, especially compared to the current approach, the current dominant paradigm in artificial intelligence and machine learning these days, which is deep neural networks. The way they learn is just through millions, if not billions of data points, and it's certainly not how we learn.

It's interesting that at the beginning, there were people there really searching for how we learned. AI was almost a mirror. It was a tool for learning about ourselves. I find that really interesting. That's always what drew me to AI. When I was still in school, that's what I found interesting, that it was telling me about myself.

"As part of this work on cognitive simulation, Newell, Simon and Shaw developed the first list-processing language, IPL, which, according to Simon, introduced many ideas that have become fundamental for computer science in general, including lists, associations, schemas, also known as frames, dynamic memory allocation, data types, recursion, associative retrieval, functions as arguments, and generators, also known as streams.

"John McCarthy's LISP, which became the standard language in the AI community after its development in 1958, incorporated these basic principles of IPL in a language with an improved syntax and a garbage collector that recovered unused memory."

I might have it in a quote somewhere else. IPL was more like an assembly language, but it did have all these other features, like a list of data structure. You could allocate memory dynamically, have other data types. That's really interesting.

The other recipient, Herbert Alexander Simon, also known as Herb Simon, his birthday was in 1916, makes him 58 at the time of reception. Did I do that right, that math? 59? Yeah, 59. He was his professor. He was Allen Newell's professor for a long time.

Apparently, it says that they were equals. It was just that Herb Simon was the older and already had a position. He had Newell come as a grad student so that they could work together, but they actually were peers.

The citation, as they call it, is the same, so I'll read from his biography. Pretty interesting character. "The human mind was central to all of Simon's work, whether in political science, economics, psychology, or computer science.

"Indeed, to Simon, computer science was psychology by other means. Simon's remarkable contributions to computer science flowed from his desire to make the computer an effective tool for simulating human problem-solving."

It's very similar to Alan Newell's goal of simulating the way humans think, not just solving the same kinds of problems that humans can solve. It has to do it in the same way.

The other thing is it talks about his work in economics. He won the Nobel Prize in economics. Herb Simon coined the term satisficing and it was this breakthrough idea in economics that you couldn't optimize. You could only satisfice.

There's too much information for a person to process and choose the perfect solution, to satisfying their needs and doing a cost-benefit analysis. You have to use heuristics to make progress in the world. Often, what people do is choose whatever works. The first thing that comes to mind that might work and not optimizing.

He won the Nobel Prize for that. It seems like common sense today, but that's his influence on us. That's how we think these days.

"In addition to employing principles of heuristic problem solving, the Logic Theorist was an error-controlled feedback machine that compared the goal state with the current state and formed one of a small set of basic operations in order to reduce the difference between the two states. The Logic Theorist was a remarkable success.

"Simon Newell and Shaw elaborated on its basic principles in creating another renowned program, the General Problem Solver or GPS, in 1957 and '58. The GPS was not quite as universal as its name implied, but it was startlingly good at solving certain kinds of well-defined problems. Even more GPS, like LT, appeared to solve them in much the same ways that humans did."

This does go back -- we're turning 1957, 1958 -- to a different paradigm of artificial intelligence where they were much more closely tied to, say, the psychology department. They were doing psychological experiments to understand how people solve problems.

They would give the person a problem. They would ask them to solve it and talk. They would train them to talk about what they were thinking about at the time. They would try to figure out what steps did they take to get to the solution, and how do we generalize that?

He, Simon, said that, "We need a less God-like, and more rat-like, picture of the chooser." LT and GPS were intended to create just such rat-like models of how people actually solve problems in the real world. Rat-like meaning just mechanical and animal, and not some all-knowing, all-seeing entity.

He was a strong, even fierce advocate of the computer program as the best formalism for psychological theories, holding that the program is the theory. The fullest statement of this belief was the monumental text, "Human Problem Solving," authored by Simon and Newell in 1972, in which they introduced the notion of a program as a set of production systems or if-then statements.

Here again, we see that he was this proponent of the idea that the best way to understand a person or psychology -- people in general -- is through computer programs because you can actually formalize the thoughts, the thought processes in a way that other theories of mind do not.

He talks about this a little bit in the lecture about how behaviorism and Gestalt's theory and all these other theories are so vague. You can't really use them to make predictions or anything. You need some mechanism, something, and a computer is a good simulation of that.

The flip side of this coin was his insistence that computer simulation was an empirical science that taught us new and valuable things about ourselves and our world. Simulation was not an exercise in elaborating tautologies. This is what the main topic of the talk is, computer science as empirical inquiry, that computer science is a science. It's a kind of science.

We'll get more into it in the paper. Last, but not least, Simon believed that organization and structure were critical. What his computer simulations simulated was not the actual physical operations of neurons in the brain, but rather the structure of problem-solving processes.

The computer program thus could be a structural model of the mind in action, not a model of its specific physical make-up. Two of the key conclusions he drew about the structure of our human mental processes are that they are hierarchical and that they are associative.

He believed that they have a tree structure with each node/leaf linked to a branch above it. Each leaf could either be one thing or a set of things, a list of things to be precise.

Since items on a list could call items on other lists, this model of the mind could work associatively within its basic hierarchical structure, creating webs of association amongst the branches of the mind's tree. He doesn't go too much into that in the talk.

It's been a while since I've done one of these lectures. I actually printed this out months ago and have been working through it. I've read it several times. I think it's well worth it.

Herb Simon is the only person, I think, to have won both a Nobel Prize and a Turing Award. He's kind of a big deal. He is an important person that we should recognize and, "Oh, I have it on myself."

He wrote a book that's called "Sciences of the Artificial." Also, definitely worth a read. Not very long. But as Alan Kay said, "He won a Nobel Prize and a Turing Award. Read his book. Come on. Why not?" He's an important figure and I think the big themes in this lecture are the topic of the book. I'll probably have some comments that I remember reading in the book.

I didn't re-read the book for this. I probably should have, but it was already getting too long so I didn't. The break was getting too long so I didn't have a chance to read it again.

Let's get into it. Just some information before I start. I don't read the whole thing. It's actually pretty long. It's 13 pages. I just pick out phrases and sentences, maybe whole paragraphs that I think are worth commenting on, that I have something to say about.

Another thing is this gets pretty wordy. He's not a concise writer. There's a lot of lilt to what he says, a lot of intellectual flourishes of his speech. I don't know who's actually doing the writing here, but it seems like someone who's used to being professorial.

I did my best in some places to skip words and sometimes whole phrases because they didn't really add anything just because it can...It is going to get long. It's already going to be too long. I did my best let me put it that way.

Some people are super succinct. You can find one sentence that says it all, and I just have to read that. Other people are like, "OK." Like, "The point starts at the top here, and now, I'm at the bottom and I see the other half of the point." He made this long thing in between, like, "Can I skip it?" It's hard to figure out. OK, so let's just start at the beginning.

"The machine -- not just the hardware, but the programmed living machine -- is the organism we study."

He's still in the first paragraph here. He's in the introduction. When most people who have won the award start, they often referred to the past lectures, trying to put some spin, some perspective on what they can tell us. I'm going to skip the part where he explicitly talks about the other lectures, but he's contrasting his view with the other views.

For instance, Dijkstra had that famous quote in his Turing lecture that, "Computer science is not about the machines, it's about the programs." Just like astronomy is not about telescopes, computer science is not about computers. Something like that.

Well, Simon and Allen Newell, they're saying that it's the running machine with software on it, the behavior of the machine, which I find is very different from the people who have come before. It's not about the software, the abstractions. It's not about the hardware and how you do stuff efficiently.

We've seen people talk about that, but their view is that it's the running, living programmed hardware. I love that, because it's a totally new view from his book where he really talks about how it's a new thing to study.

There's properties of it that we don't understand and we can empirically determine. For instance, how do search algorithms work, and things like that. Those are the kinds of things we can empirically study by running programs on computers.

Here he goes into the main topic. "Computer science is an empirical discipline. Each new machine that is built is an experiment. Actually constructing the machine poses a question to nature, and we listen for the answer by observing the machine in operation and analyzing it by all analytical and measurement means available."

He continues, "Each new program that is built is an experiment. It poses a question to nature and its behavior offers clues to an answer. Neither machines nor programs are black boxes. They are artifacts that have been designed, both hardware and software, and we can open them up and look inside."

Here, he's laying out the whole thesis that this thing that we've created, these machines with software that runs on them, they're like a whole new world that we can do experiments on.

"As basic scientists, we build machines and programs as a way of discovering new phenomena and analyzing phenomena we already know about. Society often becomes confused about this believing that computers and programs are to be constructed only for the economic use that can be made of them.

"It needs to understand that the phenomena surrounding computers are deep and obscure, requiring much experimentation to assess their nature. It needs to understand that, as in any science, the gains that accrue from such experimentation and understanding pay off in the permanent acquisition of new techniques.

"And that, it is these techniques that will create the instruments to help society in achieving its goals. Our purpose here, however, is not to plead for understanding from an outside world. It is to examine one aspect of our science, the development of new basic understanding by empirical inquiry."

Let's talk a little bit about empiricism and computer science. I actually read two books at the same time. I read Herb Simon's, "Sciences of the Artificial" and another book called "Algorithms to Live By." I felt that they complemented each other. Algorithms to Live By gave some good examples of phenomena that computer science elucidate. One of them was why a bigger load of laundry takes longer than a smaller load.

It's quite simple, that if you sort your clothes, sorting is...It's bigger than order N. It's like order N log N are the best. The bigger your load, the longer it takes. It's not linearly, it's more than linear like growing.

Smaller batches of clothing should be faster to do. Of course, we probably don't sort our clothes anymore so much like we used to. You get the idea that this can teach us why certain things in our everyday lives take longer. Why is it easy to sort a small set of, let's say, playing cards versus the whole set?

The small set, you can see the whole thing and keep it all in your head, and boom, right in order. If there's 52 of them, you can't see them all. You're moving around a lot. It takes a long time.

I think that that's the thing that he's talking about. It might be a very simple basic example. If you extrapolate that too, now, let's see if we can explain what's happening in a human mind by writing a program that does the same thing.

We can understand that program and what it's doing. By analogy, what must be happening in that mind, and why certain things work better in the program, does that match up with what we see in the mind? It's a probe into complex systems that are hard to understand by themselves, that we have this tool of simulation that we couldn't have before.

Another thing is he's going to go deeper into science and also into artificial intelligence. This is maybe one of the best short summaries of artificial intelligence at that time that I've ever read, so we'll get into that too.

Finally, I just want to say that he's talking about...He's going to examine one aspect of our science, the development of new basic understanding by empirical inquiry. He's going to give some examples, some illustrations. These are from their own work. They were big in artificial intelligence so a lot of them are going to be artificial intelligence examples.

Time permits taking up just two examples. The first is the development of the notion of a symbolic system. The second is the development of the notion of heuristic search.

Both conceptions have deep significance for understanding how information is processed and how intelligence is achieved, however, they do not come close to exhausting the full scope of artificial intelligence, though they seem to us to be useful for exhibiting the nature of fundamental knowledge in this part of computer science.

They're two examples that they're going to give, symbolic systems and heuristic search.

"One, symbols and physical symbol systems. One of the fundamental contributions to knowledge of computer science has been to explain, at a rather basic level, what symbols are. This explanation is a scientific proposition about nature. It is empirically derived with a long and gradual development."

This is a big, mysterious statement. It took me a long time to get this. Symbols, I think he's referring to symbols as in not symbols in general, but symbols as in LISP symbols, but that he claims that they're...they claim, sorry, that they're very...the same kinds of things happening in our minds, and we'll get to that.

A LISP symbol is just a string, a different type, but it's just a string of characters, and it can represent something. We'll get to that in a second. The important thing is that it's empirically derived. They've done a certain number of experiments that got to that point, that it wasn't just arbitrary. It wasn't like, I don't know, "We just invented this thing and it works."

"Symbols lie at the root of intelligent action, which is, of course, the primary topic of artificial intelligence. For that matter, it is a primary question for all computer science. For all information is processed by computers in the service of ends, and we measure the intelligence of a system by its ability to achieve stated ends in the face of variations, difficulties, and complexities posed by the task environment.

"This general investment in computer science in attaining intelligence is obscured when the task being accomplished are limited in scope, for then the full variation in the environment can be accurately foreseen.

"It becomes more obvious as we extend computers to more global, complex, and knowledge-intensive tasks, as we attempt to make them our own agents, capable of handling on their own the full contingencies of the natural world."

He's just saying that, sometimes, you have these simple examples, and they're so small that it doesn't look intelligent. You can see that it's just obviously a little mechanism. As you have to deal with unforeseen things, you need more intelligence. The decisions have to be smarter. Those are the kinds of systems that require a symbol system.

"Our understanding of the system's requirements for intelligent action emerges slowly. It is composite, for no single elementary thing accounts for intelligence in all its manifestations. There is no 'intelligence principle,' just as there is no 'vital principle' that conveys by its very nature the essence of life.

"But the lack of a simple deus ex machina does not imply that there are no structural requirements for intelligence. One such requirement is the ability to store and manipulate symbols."

He's trying to make an analogy. They are trying to make an analogy -- sorry, I keep doing that -- that there's no one thing that makes something alive. There's no one thing that makes something intelligent. Some things are required. You can't just throw it all away and say, "There's no way to describe it." There's a requirement, and that is a store and manipulate symbols.

Now, he's going to go on a little detour and talk about science, because he has to explain his little background.

"All sciences characterize the essential nature of the systems they study. These characterizations are invariably qualitative in nature, for they set the terms within which more detailed knowledge can be developed."

They define the terms qualitatively, and this lays the groundwork for what you can talk about in that field. "A good example of a law of qualitative structure is the cell doctrine in biology, which states that the basic building block of all living organisms is the cell. The impact of this law on biology has been tremendous, and the lost motion in the field prior to its gradual acceptance was considerable."

This is a really interesting statement here. This was 1975, remember. I learned this cell doctrine, cell theory in biology class in high school, that all living things are made of cells.

There's like three pieces to it, but it's basically that everything is made of cells, cells are really small, and nothing is smaller than a cell, something like that. The first thing that people nowadays say when you bring up this cell theory is, "What about viruses? Aren't viruses alive? They have DNA. They multiply. They're alive."

We have a different view today that doesn't always mesh with the cell theory. What I think is difficult is that, in science, you can have a theory that's not 100 percent right, and it's still productive. It's still useful.

He says that the impact of this law on biology has been tremendous because, before that, we had no idea how the body worked. Once you could break stuff up into cells, you can start doing cell physiology and understand, what's happening inside one cell, that that's an important thing to understand. It really opened up the field.

Does that mean that they didn't include viruses as alive? Yes, they did not. That's fine. There is still a lot to learn with this frame. I think that it takes a lot of science education to realize that these theories are just mental toys.

They are models that include and exclude parts of the phenomena. They're just useful for thinking. They're useful for framing a problem, for deciding what to work on next, things like that.

I just wanted to talk about that because he brought up cell doctrine, and I thought it's a good chance to talk about that aspect of science, especially when we're now 50 years, almost 50 years after this in a totally two paradigms later in artificial intelligence. It doesn't invalidate what they were doing, that there's now a different dominant paradigm.

"The theory of plate tectonics asserts that the surface of the globe is a collection of huge plates -- a few dozen in all -- which move against, over, and under each other into the center of the Earth, where they lose their identity."

Just another example of a big theory that's qualitative and foundational, and once you have that, it puts a totally different picture, it paints a totally different picture of what's going on, in this case, on the Earth with the continents.

"It is little more than a century since Pasteur enunciated the germ theory of disease. The theory proposes that most diseases are caused by the presence and multiplication in the body of tiny, single-celled, living organisms, and that contagion consists in the transmission of these organisms from one host to another."

Another example. Another one, the doctrine of atomism offers an interesting contrast to the three laws of qualitative structure we have just described.

"The elements are composed of small, uniform particles differing from one element to another, but because the underlying species of atoms are so simple and limited in their variety, quantitative theories were soon formulated, which assimilated all the general structure in the original qualitative hypothesis."

OK, just talking about atoms and elements, how this leads to chemistry. Now, he's going to conclude about this. "Laws of qualitative structure are seen everywhere in science. Some of our greatest scientific discoveries are to be found among them. As the examples illustrate, they often set the terms on which a whole science operates."

He's going to lay out two of those for computer science. Artificial intelligence, really.

"Let us return to the topic of symbols, and define a physical symbol system. A physical symbol system consists of a set of entities, called symbols, which are physical patterns that can occur as components of another type of entity called an expression or symbol structure. Thus, a symbol structure is composed of a number of instances of symbols related in some physical way such as one token being next to another.

"At any instance of time, the system will contain a collection of these symbol structures. The system also contains a collection of processes that operate on expressions to produce other expressions, processes of creation, modification, reproduction and destruction.

"A physical symbol system is a machine that produces through time an evolving collection of symbol structures." This is a physical symbol system. Over time, there is a collection of expressions that changes over time.

"Two notions are central to the structure of expression, symbols and objects, designation and interpretation. Designation. An expression designates an object if given the expression, the system can either affect the object itself or behave in ways dependent on the object."

It says affect, affect. I think they might mean effect can have an effect on the object itself. Either way, what a symbol can designate something, meaning it can refer or represent something if the system can change it or behave in ways dependent on it.

Like if you say, "That cat," the system can maybe poke the cat, right? It somehow knows that that symbol for cat, that expression that cat can also be poked with a rod that's affecting it. Then, if you can behave in ways depending on the object so if the cat moves, maybe you can...your expression about what about that cat changes.

Interpretation. This is the second thing. T"he system can interpret an expression if the expression designates a process, and if given the expression, the system can carry out the process." Now, expressions can designate a process and he's calling this interpretation when you run it.

We get a lot of lisp vibes from this that you can make expressions out of symbols and then run them.

Now, he's got this hypothesis. "The physical symbol system hypothesis. A physical symbol system has the necessary and sufficient means for general intelligent action. By necessary we mean that any system that exhibits general intelligence will prove upon analysis to be a physical symbol system.

"By sufficient, we mean that any physical symbol system of sufficient size can be organized further to exhibit general intelligence. By general intelligent action, we wish to indicate the same scope of intelligence as we see in human action."

I don't want to get into the debates about whether general intelligence is possible a good term. General is just...It accepts too much, right? They do define it as saying it's the same scope that we see in human action. The ability to deal with a lot of nuance and variability in our environment and stuff.

Those things always bug me, because it's very rare...yes, humans can play the violin really well, and they can play chess really well, and they can do acrobatics, and not fall down, let's say. Yes, it's true, but it's rare that you have a single person that does all three very well. [laughs] It's not fair that you expect the same machine to do all three.

I worked in robotics, so I have a lot of those kinds of feelings, like, "It's not fair. Humans fall down all the time." I wanted to talk about this designation again. That's really where the meaning comes from where in this view, the thing that connects the symbol to the real world, to an object or something in the real world, is the ability to effect the thing in the real world, to poke the cat, or to behave differently when the object changes.

If the cat moves, maybe your head turns because you recognize the cat. It's a very mechanistic view of meaning, but I can't think of a better one. We often equate meaning. Meaning is one of what Minsky calls a suitcase word, where you just pack whatever you want in there. That meaning we often think of is like human meaning, like a lot of emotional investment in the thing.

He's excluding, let's say, the medium on which the symbol system runs. That's not quite as important as the fact that it is a symbol system. Maybe your symbol is love or your symbol is the word love or your symbols system is oxytocin, or some hormone that flows and sends a message to all cells, "Oh, this is what we're doing now."

It's a much more basic one than the word and one that exists in more animals and things, but it's serving the same purpose and perhaps, that's...I don't know. I don't know what else to say about that. We now need to trace the development of this hypothesis and look...Oh, wait, I missed something.

This is an empirical hypothesis. The hypothesis could indeed be false. We now need to trace the development of this hypothesis and look at the evidence for it. He's stating something pretty strong here.

The hypothesis itself is pretty strong, necessary and sufficient for intelligence, it's very strong, but it's falsifiable. If you could construct an intelligent system that does not have a symbol system, then it wouldn't be necessary.

If you construct one that...or if you could make a symbol system that was complex enough as he says, it's sufficient size, but that you couldn't organize into a general intelligence then also falsified. It's still a hypothesis. We haven't developed a general intelligence system.

"A physical symbol system is an instance of a universal machine. Thus the symbol system hypothesis implies that intelligence will be realized by a universal computer. However, the hypothesis goes far beyond the argument, that any computation that is realizable can be realized by a universal machine, provided that it is specified."

Universal machine, he's talking about a Turing machine. He's saying that yes, intelligence is runnable on a Turing machine. I don't know if that is still something that people argue about. I've accepted it for myself that a Turing machine could do what I do if we knew how to program it.

I think some people still hold out that there's something unique in the human mind, perhaps. I know some philosophers used to, but I'm not sure if that's the case anymore. I think people use computers all the time now and they're much more, I don't know. I don't know.

I don't think there's anything special and I think that the people in artificial intelligence at least at this time, did not think that human was special in that it violated the Turing universality.

He's saying another thing that the hypothesis goes further than just saying that the human mind is simulated will by a Turing machine. It's talking about its structure. How would it have to work, if it were simulated by a Turing machine?

"For it asserts, specifically, that the intelligent machine is a symbol system, thus making a specific architectural assertion, about the nature of intelligence systems. It is important to understand how this additional specificity arose."

OK. That's what I just said. "The roots of the hypothesis go back to the program of Frege and of Whitehead and Russell, for formalizing logic, putting the notions of proof and deduction on a secure footing. Logic, and by incorporation all of mathematics, was a game played with meaningless tokens. According to certain purely syntactic rules.

"Thus progress was first made by walking away from all that seemed relevant to meaning and human symbols. We could call this the stage of formal symbol manipulation." This does seem weird that, to understand intelligence, human intelligence better, we got rid of all meaning. [laughs]

We started talking about A's and B's, AROB, stuff like that. It's just all symbols. You would convert some English sentence that had a lot of meaning into some logical statement with just logical variables. You do some manipulation and then you convert the variables back to the things they represented in the real world.

Seems weird, right? It seems like you would want to know more about the thing, not abstract it down to a letter. But that was the first step.

"This general attitude is well reflected in the development of Information theory. It was pointed out, time and again, that Shannon had defined a system that was useful only for communication and selection, which had nothing to do with meaning."

I don't know if you know this, but some people tried to change the name from information theory because, at that time, information was thought of as something that had a lot of meaning, but in the way Shannon formulated it and correctly. It was the right way to do it.

It was just zeros and ones and it wasn't about meaning or any information like what people thought of as information at that time. Now, we think of it like that. Yeah, sure. It's just bits.

Back then it was like, "Well, that's not really information. You can't learn anything from it. It's all about the amount of entropy in the system." That's it. Makes sense. Maybe the word is wrong, but it's changed now.

First nail in the coffin, here we go.

"A Turing machine consists of two memories, an unbounded tape and a finite state control. The tape holds data. The machine has a very small set of proper operations, read, write, and scan operations, on the tape. The read operation is not a data operation but provides conditional branching to a control state as a function of the data under the read head.

"As we all know, this model contains the essentials of all computers in terms of what they can do, though other computers with different memories and operations might carry out the same computations with different requirements of space and time.

"The model of a Turing machine contains within it the notions both of what cannot be computed and of universal machines. The logicians Emil Post and Alonzo Church arrived at analogous results on undecidability and universality. In none of these systems is there, on the surface, a concept of the symbol as something that designates."

This is another step in mechanizing thought, mechanizing decisions, mechanizing all this stuff. It's now all about zeros and ones on this tape. There's nothing about something that designates. There's no meaning.

"The data are regarded as just strings of zeros and ones. Indeed, that data can be inert is essential to the reduction of computation to physical process." There's no meaning anymore in this stuff. It's all zeros and ones, and basic operations of reading and writing and moving the tape.

"What was accomplished at this stage was half the principle of interpretation, showing that a machine can be run from a description. Thus, this is the stage of automatic formal symbol manipulation." We can represent program as data. That's the second step.

"With the development of the second generation of electronic machines in the mid-'40s, came the stored program concept. This was rightfully hailed as a milestone, both conceptually and practically. Programs now can be data and can be operated on as data.

"The stored program concept embodies the second half of the interpretation principle. The part that says that the system's own data can be interpreted, but it does not yet contain the notion of designation of the physical relation that underlies meaning." That was step three. Now, the last step, I think.

"The next step taken in 1956, was list processing. The contents of the data structures were now symbols in the sense of our physical symbol system, patterns that designated that had reference.

That this was a new view was demonstrated to us many times in the early days of list processing when colleagues would ask where the data were that is, which lists finally held the collections of bits that were the content of the system. They found it strange that there were no such bits, here were only symbols that designated yet other symbol structures."

They just won this award. They're allowed to talk about their work, but they're saying that their list processing, the IPL language was a breakthrough on the way to this hypothesis. They give a linear progression of logic, then Turing machine, then stored program computer and then list processing. All four of those steps lead you to this hypothesis.

"List processing is simultaneously three things in the development of computer science. One, it is the creation of a genuine dynamic memory structure in a machine that had heretofore been perceived as having fixed structure.

"It added to our ensemble of operations those that built in modified structure in addition to those that replaced and changed content. Two, it was an early demonstration of the basic abstraction that a computer consists of a set of data types and a set of operations proper to these data types.

"Three, list processing produced a model of designation, thus defining symbol manipulation in the sense in which we use this concept in computer science today." This is where he's...or they...Sorry.

They are presenting their work as a tangent to this hypothesis that they're presenting. They're just describing IPL, that you could have this dynamic memory structure, you can allocate little linked list nodes, and build lists dynamically, and change them and make interesting structures.

That you didn't have to have a fixed set of data that was in data statements at the end of your program, like a lot of languages did. The idea that it had different data types with operations that operated on those data types, that's interesting too. Of course, this model of designation, which they've already talked about.

"The conception of list processing as an abstraction created a new world in which designation and dynamic symbolic structure were the defining characteristics. We come now to the evidence for the hypothesis, that physical symbol systems are capable of intelligent action, and that general intelligent action calls for a physical symbol system.

"The hypothesis is an empirical generalization and not a theorem." He said that so many times. "Our Central aim however is not to review the evidence in detail, but to use the example before us to illustrate the proposition that computer science is a field of empirical inquiry, hence we will only indicate what kinds of evidence there is and the general nature of the testing process."

I said before, that this lecture is like the best short summary of artificial intelligence, this paradigm of artificial intelligence, that I've ever read.

That's where it starts, right here, is it's going to be a lot of AI from now on. It's interesting because I feel like there's quite a lot more to say on his main thesis. They seem to indicate that this is necessary. Although I think it's very, very interesting, I don't think it's such a great support for what their main thesis is, that computer science is empirical.

I do think that they just wanted to show their work. They wanted to illustrate with their work. They spend, I don't know, five or six pages on AI, and a lot of it is their work. "20 years of work in AI, has seen a continuous accumulation of empirical evidence of two main varieties. The first addresses itself to the sufficiency of physical symbol systems for producing intelligence.

"The second kind of evidence addresses itself to the necessity of having a physical symbol system, wherever intelligence is exhibited. The first is generally called artificial intelligence, the second is research in cognitive psychology." He's dividing their work into two fields. One is more, how do we build these systems that are intelligent? How do we make our chess program better, that kind of thing?

The other is research into humans. How do they think, what kinds of models can we develop, that kind of thing. "The basic paradigm for the initial testing of the germ theory of disease was identify a disease, then look for the germ. An analogous paradigm has inspired much of the research in artificial intelligence.

"Identify a task domain calling for intelligence, then construct a program for a digital computer that can handle tasks in that domain.

"The easy and well-structured tasks were looked at first. Puzzles and games, operations, research problems of scheduling and allocating resources, simple induction tasks. Scores, if not, hundreds of programs of these kinds have by now been constructed, each capable of some measure of intelligent action in the appropriate domain."

This is an interesting analogy he's making that if you had a disease, look for the germ. This is more like, "If you have a problem that humans can solve, try to solve it with a computer." [laughs]

Then, of course, symbol system. This is, again, the first kind is addressing the sufficiency. Can a physical symbol system exhibit intelligent? That's sufficiency. Is it sufficient to exhibit intelligent? Then the second part, which is cognitive psychology, is the necessity. We'll look at that. He hasn't gotten to that yet.

"From the original tasks, research has extended to build systems that handle and understand natural language in a variety of ways, systems for interpreting visual scenes, systems for hand-eye coordination, systems that design, systems that write computer programs, systems for speech understanding. The list, if not is, if not endless, at least very long.

"If there are limits beyond which the hypothesis will not carry us, they have not yet become apparent. Up to the present, the rate of progress has been governed mainly by the rather modest quantity of scientific resources that have been applied, and the inevitable requirement of a substantial system building effort for each new major undertaking."

[pause]

Eric: He's just saying that it's gotten more complicated, there's a long list of programs that do somewhat intelligent stuff. Of course, we know in the future that these things are still hard. [laughs] Interpreting visual scenes is not a solved problem, hand-eye coordination, designing, writing computer programs.

These are all things that we still find are not easy to write. "They haven't been solved, but perhaps they did find little pieces that made some sense. There has been great interest in searching for mechanisms possessed of generality and for common programs performing a variety of tasks.

"This search carries the theory to a more complete characterization of the kind of symbol systems that are effective in artificial intelligence." After writing all these programs, you start seeing some patterns, right? You want to find the pieces and parts that you can put together and reuse.

"The search for generality spawned a series of programs designed to separate out general problem-solving mechanisms from the requirements of particular task domains.

"The General Problem Solver was perhaps the first of these, while among its descendants are such contemporary systems as Planner and Conniver. More and more, it becomes possible to assemble large intelligent systems in a modular way from such basic components."

Sometimes I think that with the kinds of compute resources that we have available today, if we were to actually go back and rewrite these original systems, we might actually get a lot more out of them.

I wonder, though. I wonder if that isn't like what graduate students do these days. Fire up an easy to cluster and run like a modern version of General Problem Solver on it. I think a lot of what they learned was that.

Knowledge was one of the big constraints. You would look at a thing and it would get stuck and you'd say, "Why is this not able to solve this problem?" And it turned out that, "Oh, the system needs to learn that. The system didn't know that a whale was a mammal."

We need to write that down and even then it runs a little longer, then like, "Oh, it needs to know the density of water at sea level." Let's write that in there.

"Oh, it needs to know that humans can't breathe under water." Let's write that. It becomes that we know billions of facts about the world, if you want to call them facts.

Then, once you try to solve problems, it's not really the logic [laughs] and the reasoning. It's you don't know enough, your AI doesn't know enough. There's actually a cool project called cyc, C-Y-C, that has the goal of generating and creating a database of all these facts.

"If the first burst of research simulated by germ theory consisted largely in finding the germ to go with each disease, subsequent effort turn to learning what a germ was. In artificial intelligence, an initial burst of activity aimed at building intelligent programs for a wide variety of almost randomly selected tasks, is giving way to research aimed at understanding the common mechanisms of such systems."

Looking from 40, 46 years in the future, I don't know if we got much farther. Sorry to say.

Now, he's talking about the other side of this, the part two, which is talking about whether it's necessary to have a symbol system, and using humans as the subject of study human minds. "The results of efforts to model human behavior with symbol systems are evidence for the hypothesis and search in artificial intelligence, collaborates with research and information processing psychology.

"Explanations of man's intelligent behavior in terms of symbol systems has had success over the past 20 years, to the point where information-processing theory is the leading contemporary point of view in cognitive psychology." These are broad statements but pretty strong.

The last point is that the theory of information processing is the leading view in cognitive psychology. That's fascinating. I don't know enough about cognitive psychology to be able to evaluate that. This idea that the model of computer is information processing. That's very clear, that means it's basically self-evident.

That influencing how we understand the brain and how we understand, say, the human senses and stuff as information processing is an interesting development. Another way that the computer is influencing other sciences.

"Research and information-processing psychology involves two main kinds of empirical activity. The first is the conduct of observations and experiments on human behavior in tasks requiring intelligence. The second is the programming of symbol systems to model the observed human behavior.

"The psychological observations lead to hypotheses about the symbolic processes the subjects are using, and these go into the construction of the programs. Thus, many of the ideas for the basic mechanisms of GPS were derived from careful analysis of the protocols that human subjects produced while thinking aloud during the performance of a problem-solving task."

Too many words, man. Too many words. This, I feel is a little weak that he's referring again to GPS. What's weak about it is he clearly divided the hypothesis in two, and said that AI was working on one side of the hypothesis and cognitive psychology was working on the other half.

Now, he's saying that cognitive psychology was influencing the other half, and I just lose like well...Is it really divided? I feel like it might be better to not have divided them up. I don't know if that's a meta commentary or what about his argument, but I think what's more interesting is the back and forth, right? The back and forth between cognitive psychology and artificial intelligence.

Cognitive psychologists learn something about how humans think than the AI folks write it into their programs. That leads to a better program and then that better program, you generalize it a little bit and you can tell the cognitive psychologists, "Hey, look for something like this because that seems to work really well in our programs," and then maybe they find it.

I think that's much more interesting than like the split. OK, the absence of specific competing hypotheses as to how intelligent activity might be accomplished...Oh, sorry, this is one of those ones where I skipped a lot of words. He's talking about other evidence. It's negative evidence.

"The absence of specific competing hypotheses as to how intelligent activity might be accomplished. There is a continuum of theories usually labeled behaviorism to those usually labeled Gestalt theory. Neither stands as a real competitor to the symbol system hypothesis. Neither behaviorism nor Gestalt theory has demonstrated that the explanatory mechanisms account for intelligent behavior and complex tasks.

"Neither theory has anything like the specificity of artificial programs." Basically, he's saying, you can't translate a behaviorist model of the mind into something they can run on a computer. [laughs] To even conceive of what that would look like is making me chuckle, because behaviorist treats the organism as a black box.

They're super into the idea that, well, we can't really know what's happening in your head. It's only self-reported. We're just going to pretend like it doesn't exist or even postulate that it doesn't exist, and just look at the inputs and the outputs.

Like, we are giving you a question. You answer the question right. We reward you. If you answered a question wrong, we shock you or something, punish you. Of course, yeah, but how does it happen? How do you actually come up with the right answer? It's not even on the table. I think he's absolutely right there.

"Knowing that physical symbol systems..." Oh, he's gone on to second part. I should have said that. This is Part 2, heuristic search. The last section was all about physical symbol systems and this hypothesis that it was that symbol systems are sufficient and necessary for intelligence. This is heuristic search.

He's got a different hypothesis. They have a different numbers. "Knowing that physical symbol systems provide the matrix for intelligent action does not tell us how they accomplished this. One second example of a law of qualitative structure and computer science addresses this latter question, asserting that symbol system solve problems by using the processes of heuristic search.

"This generalization, like the previous one, rests on empirical evidence, and has not been derived formally from other premises."

He keeps saying that. I think that that's part of his point. It wasn't like Euclid coming up with the definition of point in line and then deriving the rest of stuff from that. This is actually, we made a program, we ran it, we measured how good it was at chess, and then we did it again, and we did it again, and we've learned stuff from it.

This heuristic stuff, this is what he won the Nobel Prize for, that people use heuristics in their economic activity. They don't optimize or maximize. They satisfice, meaning they have some rule that they follow or a set of rules, and they pick the one that satisfies the need that they have. They're not really trying to maximize all the time.

In fact, you can't, because to maximize you'd have to be able to try out too many options, at least in your head. Simulate too many options and you would just have analysis paralysis. You would never get to actually take action.

Another thing I want to say, this is more of a personal thing, I was working as a computer scientist. A grad student and working in a lab. I was able to generate a new experiment basically every day. I'd write some programs, some code, usually just modifying an existing one during the day, I'd run it at night, and then in the morning, I'd look at the results.

I've experienced this firsthand that you do feel like you are learning, and you're able to formulate hypotheses and falsify the hypothesis by the next day. It's very, very fast compared to other sciences. My wife however was a biologist, and she was generating one data point every few weeks sometimes.

We would talk about how discouraging it was to have so little data after working for so long, whereas I was just making up data every day and throwing it away because it was not...It's like, "No, that's not the answer."

We're in a lucky situation where we have a system where we can actually operate so quickly because...Really because the system is so malleable, and the nature of the scientific discovery process is just different. You're not trying to characterize.

What made her work difficult was that it was a cell from a certain part of the brain, and so she had to get at the cell and get a good reading. It was hard, just very physically demanding to work these little scaled-down probes. It's really hard work. Man, computers, you just fire up a hundred machines if you need them these days.

Here's the heuristic search hypothesis. "The solutions to problems are represented as symbol structures, a physical symbol system exercises its intelligence in problem-solving by search -- that is, by generating and progressively modifying symbol structures until it produces a solution structure." I'm going to go through that again. "The solutions to problems are represented as symbol structures."

Remember, these are just expressions as we know them in LISP. "A physical symbol system exercises its intelligence and problem solving by search -- that is, by generating and progressively modifying symbol structures until it produces a solution structure."

Search, it's looking for the answer by taking a simple structure and generating new ones that are perhaps better than the old one, hopefully, better. "Physical symbol systems must use a heuristic search to solve problems because such systems have limited processing resources.

"Computing resources are scarce relative to the complexity of the situations with which they are confronted. The restriction will not exclude any real symbol systems in computer or human in the context of real tasks."

This is just a summary of what he won the Nobel Prize for. What he's basically saying is, any physical system is limited. Like, in the Turing machine, it had an infinite tape and you could give it infinite time to find the answer to write all the digits of something.

He's trying to say, no, it needs to be practically limited, and it doesn't matter where you put the limit, but it has to, and it's always going to be more limited than the situation in which the system finds itself.

"Since ability to solve problems is generally taken as a prime indicator that the system has intelligence, it is natural that much of the history of artificial intelligence is taken up with attempts to build and understand problem-solving systems." Makes sense.

Now, he's going to talk about problem-solving. "To state a problem is to designate one, a test for a class of symbol structures -- solutions of the problem, and two, a generator of symbol structures -- potential solutions. To solve a problem is to generate a structure using two that satisfies the test of one."

You have two pieces, a generator that's generating potential solutions, and you have a checker that is checking whether those potential solutions are actual solutions. It's a test. I called it a checker, it's a test. "A symbol system can state and solve problems because it can generate and test."

This is more structural hypothesis. When a general intelligence system, when a physical symbol system is solving problems, it must be doing something of generate and test.

"A simple test exists for noticing winning positions in chess. Good moves in chess are sought by generating various alternatives and painstakingly evaluating them with the use of approximate measures that indicate that a particular line of play is on the route to a winning position."

You're generating, let's say all possible moves. Can this guy move here? Yes. Is that a good move? And then you evaluate whether that's going to lead to a win.

"Before there can be a move generator for a problem, there must be a problem space. A space of symbol structures in which problem situations including the initial and goal situations can be represented. How they synthesize a problem space and move generators appropriate to that situation is a question that is still on the frontier of artificial intelligence research."

Framing the problem. Usually, the problem is framed by the writers of the program, and for a long time that was what you did as an artificial intelligence programmer. You're like, "What if we represented the problem this way?" And you got one percent better results.

"During the first decade or so of artificial intelligence research, the study of problem-solving was almost synonymous with the study of search processing. Consider a set of symbol structures, some small subset of which are solutions to a given problem. Suppose further, that the solutions are distributed randomly through the entire set."

You have this big space and some of them randomly distributed are the solutions to the problem. "No information exists that would enable any search generator to perform better than a random search, then no symbol system could exhibit more intelligence than any other in solving the problem." This is if it's random, right? If it's random, all you can do is just start looking, right?

Just pick up the first one, pick up the second one, pick up the third one it could be anywhere. Why does it matter which way you do it? "A condition, then, for intelligence is that the distribution of solutions be not entirely random, that the space of symbol structures exhibit at least some degree of order and pattern.

"A second condition is that pattern in the space of symbol structures be more or less detectable. A third condition is that the generator be able to behave differentially, depending on what pattern it detected. There must be information in the problem space and the symbol system must be capable of extracting and using it."

I want to summarize these again. He kind of already summarized it, but he's saying that to be able to respond intelligently, the solutions can't be random in there. There has to be some pattern and you have to be able to detect that pattern.

Your generator shouldn't just be generating like zero and then one and then two. It's got to generate something better than that because the structure is in there, and then it has to be able to...It has to be able to generate different patterns so let me say it again.

Can't be random. You have to be able to detect the pattern, and then you have to be able to act on that pattern. Generate different solutions, different potential solutions depending on the pattern you see.

Here, he's going to give an example. "Consider the problem of solving a simple algebraic equation. AX + B equals CX + D. One could use a generator that would produce numbers which could then be tested by substituting in the equation. We would not call this an intelligent generator."

We just generate all the numbers between negative a million and a positive a million. You just tested them all for...Replaced X with that number. Does it satisfy the equation? Then that would not be intelligent. It's basically just as good as randomly trying them, so it's brute force.

"Alternatively, one could use generators that would use the fact that the original equation can be modified by adding or subtracting equal quantities from both sides, or multiplying or dividing both sides by the same quantity without changing its solutions.

"We can obtain even more information to guide the generator by comparing the original expression with the form of the solution and making precisely those changes in the equation that leave its solution unchanged."

You can be smart about this generator. This generator can know something about the operations of addition, subtraction, multiplication, and division and know how you can manipulate this equation so that it doesn't change the solutions so the equation still holds. You subtract B from both sides, stuff like that.

Now, that's one part. The second part is we know what the answer needs to look like. It has to be X equals some number, right? Or something. We want X on one side of the equal sign by itself.

We can compare the solutions we're generating and generate solutions that are leading in that direction. Have we moved stuff that's not X onto this side and stuff that is X onto that side, so you can kind of have some distance calculation. How far are we from a solution and are we getting closer?

"First, each successive expression is not generated independently, but is produced by modifying one produced previously. Second, the modifications are not haphazard, but depend on two kinds of information. Information that is constant over this whole class of algebra problems and information that changes at each step."

The information that's constant is how the operations of our work algebraically, and then differences that remain between the current expression and the desired expression so that changes at each step.

"In effect, the generator incorporates some of the tests, the solution must satisfy so that expressions that don't meet these tests will never be generated." Instead of testing brute-force, we're limiting what we generate to only things that get us closer. Now, we're talking about search trees.

"The simple algebra problem may seem an unusual example of search. We're more accustomed to thinking of problem-solving search as generating lushly branching trees of partial solution possibilities, which may grow to thousands, millions..." Nowadays billions, so the thing is the tree of the algebra problem does not branch. [laughs]

You can always know what to do next. It's clear. You just subtract some stuff out first, and then you divide some stuff, and you're done. You're always just getting a little bit closer with each move. If you're looking at chess, or more complex problems than just algebra problems, you've got branching. Now, what do you do?

"One line of research into game playing programs has been centrally concerned with improving the representation of the chessboard, and the processes for making moves on it, so as to speed up search and make it possible to search larger trees. On the other hand, there is good empirical evidence that the strongest human players seldom explore trees of more than 100 branches.

"This economy has achieved, not so much by searching less deeply than do chess-playing programs, but by branching very sparsely and selectively at each node. This is only possible by having more of the selectivity built into the generator itself."

Now notice, this is just like they said in the biography. These two researchers are more concerned with making an intelligence system that acts in the same way a human would act. They're not talking about, how can we make it search better just to make it better.

They're saying, "We studying masters, chess masters, and they're not branching as much as we're branching. They must have a generator that's smarter than ours, so that's the direction we're going to go." They would much rather do that than come up with some more efficient way of testing and just brute force generate more moves.

The somewhat paradoxical-sounding conclusion is that search is a fundamental aspect of a symbol system's intelligence, but that amount of search is not a measure of the amount of intelligence being exhibited. [laughs] [indecipherable 98:18] pretty funny thing to say. It's not about like, "We need more search. The more search, the smarter the system."

As people get better at a task, like chess...As people get better at a task, they actually search less. That's part of the intelligence, that they recognize areas of the board that they don't even need to look at, or they see that it must be one of these 5 moves, not one of the 20 possible moves. They see that instantly. It reminds me of how the AlphaGo program works.

AlphaGo has two parts. It has the generator, and it has the tester. The generator is itself a deep neural net. It has been trained on lots and lots of games to recognize potential good moves. It will generate what it considers the best moves first. This is just a generator. It's just looking at the board and saying, "Maybe here. Maybe here. Maybe here. Maybe here."

Then there's a test, which looks deeply into down the tree of possible moves. If I move here, then they're going to move here. It looks down that. It does some kind of sampling. It doesn't go all the way to a win. It does sampling to limit the amount of search that has to be done. That's how it works.

Here's the thing. I've heard people say that just the generator, without refining them with a score of, how likely are we to win if we go down this branch. Just the generator is a really high-level player. It would be an amateur. If you just took the move, they said, well, that's the best one that I can see. Boom. It doesn't even look ahead. [laughs]

It just says, probably there, move there. It's not looking, it's not analyzing deeper than that. It's just looked at the board, I think there. That player would beat you, unless you're a master. I don't know who you are, but beat me, and I know how to play Go.

It's not like, it's beating me because I don't know the rules or something like that. I know the rules. Not very good, but it would beat me. I find that really amazing. That the generator has been turned really smart. Yeah.

Then there's the test which is also pretty cool. That's sampling to see, how likely are we to win if we go down this branch? It's not actually looking at all the possibilities. It's just sampling it. When the symbolic system knows enough about what to do, it's simply proceeds directly toward its goal.

Whenever its knowledge becomes inadequate, it must go through large amounts of search before it finds its way again. That's really interesting. I think that is very human. When we don't know much about a domain, we have to do a lot of trial, and error and a lot of search, basically. The more you learn, the more direct you go right to the answer.

The potential for exponential explosion warns us against depending on the brute force of computers, as a compensation for the ignorance and unselectivity of their generators. The hope is still ignited in some human brains that a computer can be found that is fast enough and that can be programmed cleverly enough to play good chess by brute force search.

This makes me think a lot about AlphaGo again. I don't remember, but the cost of just training AlphaGo, so not all the experiments that led up to how it works and all the programming time and all that, but just the training like, "Let's train this thing on like billions of games," costs millions of dollars.

At some stage we are doing what he said, right? Like training a neural net can also be model, as search. We're deferring the problem, right? The learning part is now, this brute force, basically, search and we have the resources to do it. That scares me. It scares me.

I often think that the way we're doing neural net training these days, with the amount of data points, the amount of processing time that they require, is more like rerunning. It's more on the scale of evolution than it is on learning within your lifetime.

It's like rerunning, recapitulating, like the development of the visual cortex to do vision, right? It's not saying, "Well, we have a visual cortex. We know how to make one of those. Now, let's teach it to recognize cats."

It's saying, "Let's just start with nothing. Just big neural net, and show it millions and billions of cats and other stuff, and it'll distinguish between them. The weights of those neural nets will determine the structure that is needed to do that."

That's why it takes so much time to train these things, and so many data points. It's like running through the whole Cambrian explosion. This is a metaphor, of course. It's not actually recapitulating those things.

The Cambrian explosion, a lot of it was, because the development of the eye. Now you could see, and you could see the difference between light and dark. A little patch that's light sensitive. Then stuff could start, like predators could see what they were trying to eat, so then you had to learn to see predators to get away from them.

Boom! This big explosion of different strategies for survival. That's what I see. That's why it takes that many resources and that much energy. If you just look at the energy expenditure. Of course, it's much more directed, you don't have to worry about eating, you don't have to worry about finding a mate, all that stuff is gone.

It's just play Go. [laughs] Just recognize cats. It's still on that same order of magnitude, amount of culling of the neural net shaping of it.

Of course, there's reuse of neural nets. You can train a neural net to do one task like let's say, a visual task, and then kind of chop off the last two layers and put a new two layers on it. It's already got most of the visual cortex in there, so you don't have to redo that again.

Still, it seems like we're not doing this the way he was talking about where we're looking at how people see and building that into our system. Well, I won't comment on that anymore.

He's talking a lot about AI and I have a master's degree in AI so I have opinions on this stuff. "The task of intelligence is to avert the ever-present threat of the exponential explosion of search.

"The first route is to build selectivity into the generator. The usual consequences to decrease the rate of branching not prevented entirely. Ultimate exponential explosion is not avoided, but only postponed. Hence, an intelligent system generally needs to supplement the selectivity of its solution generator with other information using techniques to guide search."

You can't prevent exponential explosion, certainly not generally, and so we need to have more information using techniques. We got to guide the search. What does that mean? Which path do we go down?

We got a branch here. 10 different things. You got to choose which one to go down. "20 years of experience with managing tree search in a variety of tasks environments has produced a small kit of general techniques, which is part of the equipment of every researcher in artificial intelligence today.

"In serial heuristic search, the basic question always is, what shall be done next? In tree search that question in turn has two components. One, from what node in the tree shall we search next? Two, what direction shall we take from that node?" I think that's pretty self-explanatory.

"The techniques we have been discussing are dedicated to the control of exponential expansion rather than its prevention. For this reason, they have been properly called weak methods. It is instructive to contrast a highly structured situation which can be formulated say as a linear programming problem.

"In solving linear programming problems, a substantial amount of computation may be required, but the search does not branch." He just wants to say that it's not really about the amount of computation. It's more about this branching.

He talks about some other...What do you call it? Other approaches besides this generate and test approach, but I'm not going to go down there. "New directions for improving the problem-solving capabilities of symbol systems can be equated with new ways of extracting and using information. At least three such ways can be identified." He's talking about future work. Like, where do we go from here?

I do want to say I think it's clear now by how much I've read about AI that this is a pretty good summary of AI, at least, the paradigm where it was about search, problem representation, generate solutions in the problem space, test whether they're a good solution. Having to search down that...How do you generate better solutions is all about search.

I think they've lost the thread of science at this point. [laughs] They're just summarizing AI. Now, they're going into future work in AI. Again, that strengthens what I'm saying. This is good evidence for that.

"New directions for improving the problem-solving capabilities of symbol systems can be equated with new ways of extracting and using information. At least three such ways can be identified.

"First, it has been noted by several investigators that information gathered in the course of tree search is usually only used locally to help make decisions at the specific node where the information was generated.

"Information about a chest position is usually used to evaluate just that position, not to evaluate other positions that may contain many of the same features. Hence, the same facts have to be rediscovered repeatedly at different nodes of the search tree."

A simple example is sometimes you have two moves that if you do them in different orders, they'll produce the same...If you do both moves, you move the knight...Well, let's make it easy. First, you move this pawn. They move their pawn. Then you move this pawn. They move their pawn. The board doesn't matter which order you move the pawns in. They're the same end state, but the way you generate the game tree, those are two different branches.

The fact that the board looks the same at those two different branches, they're not using that information. We've already looked at this board, but it was in that branch. It's hard to do.

A few exploratory efforts have been made to transport information from its context of origin to other appropriate contexts. If a weakness in a chess position can be traced back to the move that made it, then the same weakness can be expected in other positions descended from the same move.

"Another possibility. A second act of possibility for raising intelligence is to supply the symbol system with a rich body of semantic information about the task domain it is dealing with. For example, empirical research on the skill of chess masters shows that a major source of the master's skill is to recognize a large number of features on a chessboard.

"The master proposes actions appropriate to the features recognized." Just having this encyclopedic knowledge of the features like, "Oh, this shape here. I've got to move my knight to block this..." They're thinking at this high level. They're not thinking at the low level that usually the chess programs are thinking.

"The possibility of substituting recognition for search arises because a rare pattern can contain enormous amounts of information. When that structure is irregular and not subject to simple mathematical description, the knowledge of a large number of relevant patterns may be the key to intelligent behavior.

"Whether this is so, in any particular task domain is a question more easily settled by empirical investigation than by theory." This goes back to the ghost stuff where neural nets are really good at detecting these patterns and learning these patterns, so that could be more know what is talking about. Why go programs can work now?

"A third line of inquiry is concerned with the possibility that search can be reduced or avoided by selecting an appropriate problem space." This is interesting. Some problems can be solved by using a better representation, and in that new representation, the solution is way more obvious.

Maybe you could say it requires less branching. If you try to do it in the regular representation, you have a hard problem, but this other one is much easier. He gives an example where you have a checkerboard. You have 32 tiles that are a one by two rectangle. Each tile can cover two squares, right?

You have 32 of these, right? You can cover the whole board with these 32 tiles, right? They each cover two squares. There's 64 squares on a checkerboard. All right. Now, here's the problem. If you remove opposite corners of the checkerboard, can you say removed two squares?

Can you now tile it with 31 of these tiles? Can you cover it? Well, you could brute-force it, and try every single possible combination and see if you can do it, but you could also change the representation. You could say, "Wait a second. Each tile covers both a black and a white square."

There's no way to...because it's a checkerboard pattern, you don't have two black squares next to each other or two white squares next to each other. Every one by two tile has to cover a black and a white. If we remove the two opposite corners, those are always the same color. We remove two whites or we remove two blacks.

There's no way, now, because we have...If we remove the two white corners, then there's two blacks that don't have a white with them, so we can't put the tiles on. Wow, we solved the problem. We didn't have to brute-force the search. How do you find that representation?

We've avoided all this search by changing the representation. Now, he says, "Perhaps, however, in posing this problem, we are not escaping from search processes. We have simply displaced the search from a space of possible problem solutions to a space of possible representations. This is largely unexplored territory in the domain of problem-solving research."

And I said before that often in AI, it's still the human who represents the problem. Even in neural nets, you come up with this space of vectors that's supposed to represent all possible solutions. I'm going to end now by reading what I read at the beginning. Maybe it will make more sense. A.M. Turing concluded his famous paper on Computing Machinery and Intelligence with the words, "We can only see a short distance ahead, but we can see plenty there that needs to be done."

Many of the things Turing saw in 1950 that needed to be done, have been done, but the agenda is as full as ever. Perhaps, we read too much into his simple statement above, but we like to think that in it, Turing recognized the fundamental truth that all computer scientists instinctively know.

For all physical symbol systems, condemned as we are to serial search of the problem environment, the critical question is always, what to do next?

Quite a paper, well worth a few reads. It's two of the greats that, unfortunately, are easy to forget about and overlook, but, man, their work was really important for artificial intelligence.

I don't know if I've stated this before sometime on the podcast, but I feel like the artificial intelligence project, the original one, to create machines that can do things that normal humans can do, that was very important in computer science. It's often pooh-poohed as like, "Oh, what has AI ever done?"

We have to remember that even stuff like compiling a program, parsing, and compiling a Fortran program was considered artificial intelligence. It was considered automatic programming, and so we have to give them credit.

What happens is they generate all this cool stuff on the way to try to make a better chess program, and then it becomes more widely practical and leaked out into the rest of computer science. This is the joke, once it works, they stop calling it AI, and it's just computer science.

Look at the list of stuff that IPL had, data structures. It had data types and their operations. It had dynamic memory allocation. This is stuff that we take for granted these days, but they were part of the path for getting to artificial intelligence. Without that path, feel like the situation of computing today would be a lot poorer than what we have now.

Thank you, Herb Simon and Allen Newell. I do want to recommend Herb Simon's book, it's called, Sciences of the Artificial. It goes much deeper than this short lecture did. That's all I have to say.

Thank you for listening, and as always, rock on.

Eric Normand: A.M. Turing concluded his famous paper on "Computing Machinery and Intelligence" with the words, "We can only see a short distance ahead, but we can see plenty there that needs to be done." Many of the things Turing saw in 1950 that needed to be done, have been done, but the agenda is as full as ever.

Perhaps we read too much into his simple statement above, but we like to think that, in it, Turing recognized the fundamental truth, that all computer scientists instinctively know, for all physical symbol systems condemned as we are two serial search of the problem environment. The critical question is always, what to do next?

Hello, my name is Eric Normand. Welcome to my podcast. Today, I am reading from the ACM Turing Award lecture from 1975. Alan Newell and Herbert Simon were awarded it jointly. Their Turing award lecture is called "Computer Science as Empirical Inquiry -- Symbols and Search."

As you may know, if you've listened before, I like to read from the biography. Of course, there's two biographies, because there are two different recipients of the 1975 ACM Turing Award. I'll read the first one, Allen Newell. He's actually the younger of the two. Well, I'll just get in there.

He was born in 1927. This award was given in 1975. He was 48 years old when he received it. He co-received it for making basic contributions to artificial intelligence, the psychology of human cognition, and list processing.

I think when they use the term basic here, basic contributions, I think that's a good thing. Not basic, like simple and easy, but basic like fundamental, which is true. They were part of the early crowd of the artificial intelligence group. OK, I'll read some more stuff from his biography. You should definitely read the whole thing, but I'll just read and comment on the parts that stood out to me.

"Newell is chiefly remembered for his important contributions to artificial intelligence research, his use of computer simulations in psychology, and his inexhaustible, infectious energy. His central goal was to understand the cognitive architecture of the human mind and how it enabled humans to solve problems.

"For Newell, the goal was to make the computer into an effective tool for simulating human problem-solving. A computer program that solved a problem in a way that humans did not or could not was not terribly interesting to him, even if it's solved that problem better than humans did."

This is an interesting statement, especially compared to the current approach, the current dominant paradigm in artificial intelligence and machine learning these days, which is deep neural networks. The way they learn is just through millions, if not billions of data points, and it's certainly not how we learn.

It's interesting that at the beginning, there were people there really searching for how we learned. AI was almost a mirror. It was a tool for learning about ourselves. I find that really interesting. That's always what drew me to AI. When I was still in school, that's what I found interesting, that it was telling me about myself.

"As part of this work on cognitive simulation, Newell, Simon and Shaw developed the first list-processing language, IPL, which, according to Simon, introduced many ideas that have become fundamental for computer science in general, including lists, associations, schemas, also known as frames, dynamic memory allocation, data types, recursion, associative retrieval, functions as arguments, and generators, also known as streams.

"John McCarthy's LISP, which became the standard language in the AI community after its development in 1958, incorporated these basic principles of IPL in a language with an improved syntax and a garbage collector that recovered unused memory."

I might have it in a quote somewhere else. IPL was more like an assembly language, but it did have all these other features, like a list of data structure. You could allocate memory dynamically, have other data types. That's really interesting.

The other recipient, Herbert Alexander Simon, also known as Herb Simon, his birthday was in 1916, makes him 58 at the time of reception. Did I do that right, that math? 59? Yeah, 59. He was his professor. He was Allen Newell's professor for a long time.

Apparently, it says that they were equals. It was just that Herb Simon was the older and already had a position. He had Newell come as a grad student so that they could work together, but they actually were peers.

The citation, as they call it, is the same, so I'll read from his biography. Pretty interesting character. "The human mind was central to all of Simon's work, whether in political science, economics, psychology, or computer science.

"Indeed, to Simon, computer science was psychology by other means. Simon's remarkable contributions to computer science flowed from his desire to make the computer an effective tool for simulating human problem-solving."

It's very similar to Alan Newell's goal of simulating the way humans think, not just solving the same kinds of problems that humans can solve. It has to do it in the same way.

The other thing is it talks about his work in economics. He won the Nobel Prize in economics. Herb Simon coined the term satisficing and it was this breakthrough idea in economics that you couldn't optimize. You could only satisfice.

There's too much information for a person to process and choose the perfect solution, to satisfying their needs and doing a cost-benefit analysis. You have to use heuristics to make progress in the world. Often, what people do is choose whatever works. The first thing that comes to mind that might work and not optimizing.

He won the Nobel Prize for that. It seems like common sense today, but that's his influence on us. That's how we think these days.

"In addition to employing principles of heuristic problem solving, the Logic Theorist was an error-controlled feedback machine that compared the goal state with the current state and formed one of a small set of basic operations in order to reduce the difference between the two states. The Logic Theorist was a remarkable success.

"Simon Newell and Shaw elaborated on its basic principles in creating another renowned program, the General Problem Solver or GPS, in 1957 and '58. The GPS was not quite as universal as its name implied, but it was startlingly good at solving certain kinds of well-defined problems. Even more GPS, like LT, appeared to solve them in much the same ways that humans did."

This does go back -- we're turning 1957, 1958 -- to a different paradigm of artificial intelligence where they were much more closely tied to, say, the psychology department. They were doing psychological experiments to understand how people solve problems.

They would give the person a problem. They would ask them to solve it and talk. They would train them to talk about what they were thinking about at the time. They would try to figure out what steps did they take to get to the solution, and how do we generalize that?

He, Simon, said that, "We need a less God-like, and more rat-like, picture of the chooser." LT and GPS were intended to create just such rat-like models of how people actually solve problems in the real world. Rat-like meaning just mechanical and animal, and not some all-knowing, all-seeing entity.

He was a strong, even fierce advocate of the computer program as the best formalism for psychological theories, holding that the program is the theory. The fullest statement of this belief was the monumental text, "Human Problem Solving," authored by Simon and Newell in 1972, in which they introduced the notion of a program as a set of production systems or if-then statements.

Here again, we see that he was this proponent of the idea that the best way to understand a person or psychology -- people in general -- is through computer programs because you can actually formalize the thoughts, the thought processes in a way that other theories of mind do not.

He talks about this a little bit in the lecture about how behaviorism and Gestalt's theory and all these other theories are so vague. You can't really use them to make predictions or anything. You need some mechanism, something, and a computer is a good simulation of that.

The flip side of this coin was his insistence that computer simulation was an empirical science that taught us new and valuable things about ourselves and our world. Simulation was not an exercise in elaborating tautologies. This is what the main topic of the talk is, computer science as empirical inquiry, that computer science is a science. It's a kind of science.

We'll get more into it in the paper. Last, but not least, Simon believed that organization and structure were critical. What his computer simulations simulated was not the actual physical operations of neurons in the brain, but rather the structure of problem-solving processes.

The computer program thus could be a structural model of the mind in action, not a model of its specific physical make-up. Two of the key conclusions he drew about the structure of our human mental processes are that they are hierarchical and that they are associative.

He believed that they have a tree structure with each node/leaf linked to a branch above it. Each leaf could either be one thing or a set of things, a list of things to be precise.

Since items on a list could call items on other lists, this model of the mind could work associatively within its basic hierarchical structure, creating webs of association amongst the branches of the mind's tree. He doesn't go too much into that in the talk.

It's been a while since I've done one of these lectures. I actually printed this out months ago and have been working through it. I've read it several times. I think it's well worth it.

Herb Simon is the only person, I think, to have won both a Nobel Prize and a Turing Award. He's kind of a big deal. He is an important person that we should recognize and, "Oh, I have it on myself."

He wrote a book that's called "Sciences of the Artificial." Also, definitely worth a read. Not very long. But as Alan Kay said, "He won a Nobel Prize and a Turing Award. Read his book. Come on. Why not?" He's an important figure and I think the big themes in this lecture are the topic of the book. I'll probably have some comments that I remember reading in the book.

I didn't re-read the book for this. I probably should have, but it was already getting too long so I didn't. The break was getting too long so I didn't have a chance to read it again.

Let's get into it. Just some information before I start. I don't read the whole thing. It's actually pretty long. It's 13 pages. I just pick out phrases and sentences, maybe whole paragraphs that I think are worth commenting on, that I have something to say about.

Another thing is this gets pretty wordy. He's not a concise writer. There's a lot of lilt to what he says, a lot of intellectual flourishes of his speech. I don't know who's actually doing the writing here, but it seems like someone who's used to being professorial.

I did my best in some places to skip words and sometimes whole phrases because they didn't really add anything just because it can...It is going to get long. It's already going to be too long. I did my best let me put it that way.

Some people are super succinct. You can find one sentence that says it all, and I just have to read that. Other people are like, "OK." Like, "The point starts at the top here, and now, I'm at the bottom and I see the other half of the point." He made this long thing in between, like, "Can I skip it?" It's hard to figure out. OK, so let's just start at the beginning.

"The machine -- not just the hardware, but the programmed living machine -- is the organism we study."

He's still in the first paragraph here. He's in the introduction. When most people who have won the award start, they often referred to the past lectures, trying to put some spin, some perspective on what they can tell us. I'm going to skip the part where he explicitly talks about the other lectures, but he's contrasting his view with the other views.

For instance, Dijkstra had that famous quote in his Turing lecture that, "Computer science is not about the machines, it's about the programs." Just like astronomy is not about telescopes, computer science is not about computers. Something like that.

Well, Simon and Allen Newell, they're saying that it's the running machine with software on it, the behavior of the machine, which I find is very different from the people who have come before. It's not about the software, the abstractions. It's not about the hardware and how you do stuff efficiently.

We've seen people talk about that, but their view is that it's the running, living programmed hardware. I love that, because it's a totally new view from his book where he really talks about how it's a new thing to study.

There's properties of it that we don't understand and we can empirically determine. For instance, how do search algorithms work, and things like that. Those are the kinds of things we can empirically study by running programs on computers.

Here he goes into the main topic. "Computer science is an empirical discipline. Each new machine that is built is an experiment. Actually constructing the machine poses a question to nature, and we listen for the answer by observing the machine in operation and analyzing it by all analytical and measurement means available."

He continues, "Each new program that is built is an experiment. It poses a question to nature and its behavior offers clues to an answer. Neither machines nor programs are black boxes. They are artifacts that have been designed, both hardware and software, and we can open them up and look inside."

Here, he's laying out the whole thesis that this thing that we've created, these machines with software that runs on them, they're like a whole new world that we can do experiments on.

"As basic scientists, we build machines and programs as a way of discovering new phenomena and analyzing phenomena we already know about. Society often becomes confused about this believing that computers and programs are to be constructed only for the economic use that can be made of them.

"It needs to understand that the phenomena surrounding computers are deep and obscure, requiring much experimentation to assess their nature. It needs to understand that, as in any science, the gains that accrue from such experimentation and understanding pay off in the permanent acquisition of new techniques.

"And that, it is these techniques that will create the instruments to help society in achieving its goals. Our purpose here, however, is not to plead for understanding from an outside world. It is to examine one aspect of our science, the development of new basic understanding by empirical inquiry."

Let's talk a little bit about empiricism and computer science. I actually read two books at the same time. I read Herb Simon's, "Sciences of the Artificial" and another book called "Algorithms to Live By." I felt that they complemented each other. Algorithms to Live By gave some good examples of phenomena that computer science elucidate. One of them was why a bigger load of laundry takes longer than a smaller load.

It's quite simple, that if you sort your clothes, sorting is...It's bigger than order N. It's like order N log N are the best. The bigger your load, the longer it takes. It's not linearly, it's more than linear like growing.

Smaller batches of clothing should be faster to do. Of course, we probably don't sort our clothes anymore so much like we used to. You get the idea that this can teach us why certain things in our everyday lives take longer. Why is it easy to sort a small set of, let's say, playing cards versus the whole set?

The small set, you can see the whole thing and keep it all in your head, and boom, right in order. If there's 52 of them, you can't see them all. You're moving around a lot. It takes a long time.

I think that that's the thing that he's talking about. It might be a very simple basic example. If you extrapolate that too, now, let's see if we can explain what's happening in a human mind by writing a program that does the same thing.

We can understand that program and what it's doing. By analogy, what must be happening in that mind, and why certain things work better in the program, does that match up with what we see in the mind? It's a probe into complex systems that are hard to understand by themselves, that we have this tool of simulation that we couldn't have before.

Another thing is he's going to go deeper into science and also into artificial intelligence. This is maybe one of the best short summaries of artificial intelligence at that time that I've ever read, so we'll get into that too.

Finally, I just want to say that he's talking about...He's going to examine one aspect of our science, the development of new basic understanding by empirical inquiry. He's going to give some examples, some illustrations. These are from their own work. They were big in artificial intelligence so a lot of them are going to be artificial intelligence examples.

Time permits taking up just two examples. The first is the development of the notion of a symbolic system. The second is the development of the notion of heuristic search.

Both conceptions have deep significance for understanding how information is processed and how intelligence is achieved, however, they do not come close to exhausting the full scope of artificial intelligence, though they seem to us to be useful for exhibiting the nature of fundamental knowledge in this part of computer science.

They're two examples that they're going to give, symbolic systems and heuristic search.

"One, symbols and physical symbol systems. One of the fundamental contributions to knowledge of computer science has been to explain, at a rather basic level, what symbols are. This explanation is a scientific proposition about nature. It is empirically derived with a long and gradual development."

This is a big, mysterious statement. It took me a long time to get this. Symbols, I think he's referring to symbols as in not symbols in general, but symbols as in LISP symbols, but that he claims that they're...they claim, sorry, that they're very...the same kinds of things happening in our minds, and we'll get to that.

A LISP symbol is just a string, a different type, but it's just a string of characters, and it can represent something. We'll get to that in a second. The important thing is that it's empirically derived. They've done a certain number of experiments that got to that point, that it wasn't just arbitrary. It wasn't like, I don't know, "We just invented this thing and it works."

"Symbols lie at the root of intelligent action, which is, of course, the primary topic of artificial intelligence. For that matter, it is a primary question for all computer science. For all information is processed by computers in the service of ends, and we measure the intelligence of a system by its ability to achieve stated ends in the face of variations, difficulties, and complexities posed by the task environment.

"This general investment in computer science in attaining intelligence is obscured when the task being accomplished are limited in scope, for then the full variation in the environment can be accurately foreseen.

"It becomes more obvious as we extend computers to more global, complex, and knowledge-intensive tasks, as we attempt to make them our own agents, capable of handling on their own the full contingencies of the natural world."

He's just saying that, sometimes, you have these simple examples, and they're so small that it doesn't look intelligent. You can see that it's just obviously a little mechanism. As you have to deal with unforeseen things, you need more intelligence. The decisions have to be smarter. Those are the kinds of systems that require a symbol system.

"Our understanding of the system's requirements for intelligent action emerges slowly. It is composite, for no single elementary thing accounts for intelligence in all its manifestations. There is no 'intelligence principle,' just as there is no 'vital principle' that conveys by its very nature the essence of life.

"But the lack of a simple deus ex machina does not imply that there are no structural requirements for intelligence. One such requirement is the ability to store and manipulate symbols."

He's trying to make an analogy. They are trying to make an analogy -- sorry, I keep doing that -- that there's no one thing that makes something alive. There's no one thing that makes something intelligent. Some things are required. You can't just throw it all away and say, "There's no way to describe it." There's a requirement, and that is a store and manipulate symbols.

Now, he's going to go on a little detour and talk about science, because he has to explain his little background.

"All sciences characterize the essential nature of the systems they study. These characterizations are invariably qualitative in nature, for they set the terms within which more detailed knowledge can be developed."

They define the terms qualitatively, and this lays the groundwork for what you can talk about in that field. "A good example of a law of qualitative structure is the cell doctrine in biology, which states that the basic building block of all living organisms is the cell. The impact of this law on biology has been tremendous, and the lost motion in the field prior to its gradual acceptance was considerable."

This is a really interesting statement here. This was 1975, remember. I learned this cell doctrine, cell theory in biology class in high school, that all living things are made of cells.

There's like three pieces to it, but it's basically that everything is made of cells, cells are really small, and nothing is smaller than a cell, something like that. The first thing that people nowadays say when you bring up this cell theory is, "What about viruses? Aren't viruses alive? They have DNA. They multiply. They're alive."

We have a different view today that doesn't always mesh with the cell theory. What I think is difficult is that, in science, you can have a theory that's not 100 percent right, and it's still productive. It's still useful.

He says that the impact of this law on biology has been tremendous because, before that, we had no idea how the body worked. Once you could break stuff up into cells, you can start doing cell physiology and understand, what's happening inside one cell, that that's an important thing to understand. It really opened up the field.

Does that mean that they didn't include viruses as alive? Yes, they did not. That's fine. There is still a lot to learn with this frame. I think that it takes a lot of science education to realize that these theories are just mental toys.

They are models that include and exclude parts of the phenomena. They're just useful for thinking. They're useful for framing a problem, for deciding what to work on next, things like that.

I just wanted to talk about that because he brought up cell doctrine, and I thought it's a good chance to talk about that aspect of science, especially when we're now 50 years, almost 50 years after this in a totally two paradigms later in artificial intelligence. It doesn't invalidate what they were doing, that there's now a different dominant paradigm.

"The theory of plate tectonics asserts that the surface of the globe is a collection of huge plates -- a few dozen in all -- which move against, over, and under each other into the center of the Earth, where they lose their identity."

Just another example of a big theory that's qualitative and foundational, and once you have that, it puts a totally different picture, it paints a totally different picture of what's going on, in this case, on the Earth with the continents.

"It is little more than a century since Pasteur enunciated the germ theory of disease. The theory proposes that most diseases are caused by the presence and multiplication in the body of tiny, single-celled, living organisms, and that contagion consists in the transmission of these organisms from one host to another."

Another example. Another one, the doctrine of atomism offers an interesting contrast to the three laws of qualitative structure we have just described.

"The elements are composed of small, uniform particles differing from one element to another, but because the underlying species of atoms are so simple and limited in their variety, quantitative theories were soon formulated, which assimilated all the general structure in the original qualitative hypothesis."

OK, just talking about atoms and elements, how this leads to chemistry. Now, he's going to conclude about this. "Laws of qualitative structure are seen everywhere in science. Some of our greatest scientific discoveries are to be found among them. As the examples illustrate, they often set the terms on which a whole science operates."

He's going to lay out two of those for computer science. Artificial intelligence, really.

"Let us return to the topic of symbols, and define a physical symbol system. A physical symbol system consists of a set of entities, called symbols, which are physical patterns that can occur as components of another type of entity called an expression or symbol structure. Thus, a symbol structure is composed of a number of instances of symbols related in some physical way such as one token being next to another.

"At any instance of time, the system will contain a collection of these symbol structures. The system also contains a collection of processes that operate on expressions to produce other expressions, processes of creation, modification, reproduction and destruction.

"A physical symbol system is a machine that produces through time an evolving collection of symbol structures." This is a physical symbol system. Over time, there is a collection of expressions that changes over time.

"Two notions are central to the structure of expression, symbols and objects, designation and interpretation. Designation. An expression designates an object if given the expression, the system can either affect the object itself or behave in ways dependent on the object."

It says affect, affect. I think they might mean effect can have an effect on the object itself. Either way, what a symbol can designate something, meaning it can refer or represent something if the system can change it or behave in ways dependent on it.

Like if you say, "That cat," the system can maybe poke the cat, right? It somehow knows that that symbol for cat, that expression that cat can also be poked with a rod that's affecting it. Then, if you can behave in ways depending on the object so if the cat moves, maybe you can...your expression about what about that cat changes.

Interpretation. This is the second thing. T"he system can interpret an expression if the expression designates a process, and if given the expression, the system can carry out the process." Now, expressions can designate a process and he's calling this interpretation when you run it.

We get a lot of lisp vibes from this that you can make expressions out of symbols and then run them.

Now, he's got this hypothesis. "The physical symbol system hypothesis. A physical symbol system has the necessary and sufficient means for general intelligent action. By necessary we mean that any system that exhibits general intelligence will prove upon analysis to be a physical symbol system.

"By sufficient, we mean that any physical symbol system of sufficient size can be organized further to exhibit general intelligence. By general intelligent action, we wish to indicate the same scope of intelligence as we see in human action."

I don't want to get into the debates about whether general intelligence is possible a good term. General is just...It accepts too much, right? They do define it as saying it's the same scope that we see in human action. The ability to deal with a lot of nuance and variability in our environment and stuff.

Those things always bug me, because it's very rare...yes, humans can play the violin really well, and they can play chess really well, and they can do acrobatics, and not fall down, let's say. Yes, it's true, but it's rare that you have a single person that does all three very well. [laughs] It's not fair that you expect the same machine to do all three.

I worked in robotics, so I have a lot of those kinds of feelings, like, "It's not fair. Humans fall down all the time." I wanted to talk about this designation again. That's really where the meaning comes from where in this view, the thing that connects the symbol to the real world, to an object or something in the real world, is the ability to effect the thing in the real world, to poke the cat, or to behave differently when the object changes.

If the cat moves, maybe your head turns because you recognize the cat. It's a very mechanistic view of meaning, but I can't think of a better one. We often equate meaning. Meaning is one of what Minsky calls a suitcase word, where you just pack whatever you want in there. That meaning we often think of is like human meaning, like a lot of emotional investment in the thing.

He's excluding, let's say, the medium on which the symbol system runs. That's not quite as important as the fact that it is a symbol system. Maybe your symbol is love or your symbol is the word love or your symbols system is oxytocin, or some hormone that flows and sends a message to all cells, "Oh, this is what we're doing now."

It's a much more basic one than the word and one that exists in more animals and things, but it's serving the same purpose and perhaps, that's...I don't know. I don't know what else to say about that. We now need to trace the development of this hypothesis and look...Oh, wait, I missed something.

This is an empirical hypothesis. The hypothesis could indeed be false. We now need to trace the development of this hypothesis and look at the evidence for it. He's stating something pretty strong here.

The hypothesis itself is pretty strong, necessary and sufficient for intelligence, it's very strong, but it's falsifiable. If you could construct an intelligent system that does not have a symbol system, then it wouldn't be necessary.

If you construct one that...or if you could make a symbol system that was complex enough as he says, it's sufficient size, but that you couldn't organize into a general intelligence then also falsified. It's still a hypothesis. We haven't developed a general intelligence system.

"A physical symbol system is an instance of a universal machine. Thus the symbol system hypothesis implies that intelligence will be realized by a universal computer. However, the hypothesis goes far beyond the argument, that any computation that is realizable can be realized by a universal machine, provided that it is specified."

Universal machine, he's talking about a Turing machine. He's saying that yes, intelligence is runnable on a Turing machine. I don't know if that is still something that people argue about. I've accepted it for myself that a Turing machine could do what I do if we knew how to program it.

I think some people still hold out that there's something unique in the human mind, perhaps. I know some philosophers used to, but I'm not sure if that's the case anymore. I think people use computers all the time now and they're much more, I don't know. I don't know.

I don't think there's anything special and I think that the people in artificial intelligence at least at this time, did not think that human was special in that it violated the Turing universality.

He's saying another thing that the hypothesis goes further than just saying that the human mind is simulated will by a Turing machine. It's talking about its structure. How would it have to work, if it were simulated by a Turing machine?

"For it asserts, specifically, that the intelligent machine is a symbol system, thus making a specific architectural assertion, about the nature of intelligence systems. It is important to understand how this additional specificity arose."

OK. That's what I just said. "The roots of the hypothesis go back to the program of Frege and of Whitehead and Russell, for formalizing logic, putting the notions of proof and deduction on a secure footing. Logic, and by incorporation all of mathematics, was a game played with meaningless tokens. According to certain purely syntactic rules.

"Thus progress was first made by walking away from all that seemed relevant to meaning and human symbols. We could call this the stage of formal symbol manipulation." This does seem weird that, to understand intelligence, human intelligence better, we got rid of all meaning. [laughs]

We started talking about A's and B's, AROB, stuff like that. It's just all symbols. You would convert some English sentence that had a lot of meaning into some logical statement with just logical variables. You do some manipulation and then you convert the variables back to the things they represented in the real world.

Seems weird, right? It seems like you would want to know more about the thing, not abstract it down to a letter. But that was the first step.

"This general attitude is well reflected in the development of Information theory. It was pointed out, time and again, that Shannon had defined a system that was useful only for communication and selection, which had nothing to do with meaning."

I don't know if you know this, but some people tried to change the name from information theory because, at that time, information was thought of as something that had a lot of meaning, but in the way Shannon formulated it and correctly. It was the right way to do it.

It was just zeros and ones and it wasn't about meaning or any information like what people thought of as information at that time. Now, we think of it like that. Yeah, sure. It's just bits.

Back then it was like, "Well, that's not really information. You can't learn anything from it. It's all about the amount of entropy in the system." That's it. Makes sense. Maybe the word is wrong, but it's changed now.

First nail in the coffin, here we go.

"A Turing machine consists of two memories, an unbounded tape and a finite state control. The tape holds data. The machine has a very small set of proper operations, read, write, and scan operations, on the tape. The read operation is not a data operation but provides conditional branching to a control state as a function of the data under the read head.

"As we all know, this model contains the essentials of all computers in terms of what they can do, though other computers with different memories and operations might carry out the same computations with different requirements of space and time.

"The model of a Turing machine contains within it the notions both of what cannot be computed and of universal machines. The logicians Emil Post and Alonzo Church arrived at analogous results on undecidability and universality. In none of these systems is there, on the surface, a concept of the symbol as something that designates."

This is another step in mechanizing thought, mechanizing decisions, mechanizing all this stuff. It's now all about zeros and ones on this tape. There's nothing about something that designates. There's no meaning.

"The data are regarded as just strings of zeros and ones. Indeed, that data can be inert is essential to the reduction of computation to physical process." There's no meaning anymore in this stuff. It's all zeros and ones, and basic operations of reading and writing and moving the tape.

"What was accomplished at this stage was half the principle of interpretation, showing that a machine can be run from a description. Thus, this is the stage of automatic formal symbol manipulation." We can represent program as data. That's the second step.

"With the development of the second generation of electronic machines in the mid-'40s, came the stored program concept. This was rightfully hailed as a milestone, both conceptually and practically. Programs now can be data and can be operated on as data.

"The stored program concept embodies the second half of the interpretation principle. The part that says that the system's own data can be interpreted, but it does not yet contain the notion of designation of the physical relation that underlies meaning." That was step three. Now, the last step, I think.

"The next step taken in 1956, was list processing. The contents of the data structures were now symbols in the sense of our physical symbol system, patterns that designated that had reference.

That this was a new view was demonstrated to us many times in the early days of list processing when colleagues would ask where the data were that is, which lists finally held the collections of bits that were the content of the system. They found it strange that there were no such bits, here were only symbols that designated yet other symbol structures."

They just won this award. They're allowed to talk about their work, but they're saying that their list processing, the IPL language was a breakthrough on the way to this hypothesis. They give a linear progression of logic, then Turing machine, then stored program computer and then list processing. All four of those steps lead you to this hypothesis.

"List processing is simultaneously three things in the development of computer science. One, it is the creation of a genuine dynamic memory structure in a machine that had heretofore been perceived as having fixed structure.

"It added to our ensemble of operations those that built in modified structure in addition to those that replaced and changed content. Two, it was an early demonstration of the basic abstraction that a computer consists of a set of data types and a set of operations proper to these data types.

"Three, list processing produced a model of designation, thus defining symbol manipulation in the sense in which we use this concept in computer science today." This is where he's...or they...Sorry.

They are presenting their work as a tangent to this hypothesis that they're presenting. They're just describing IPL, that you could have this dynamic memory structure, you can allocate little linked list nodes, and build lists dynamically, and change them and make interesting structures.

That you didn't have to have a fixed set of data that was in data statements at the end of your program, like a lot of languages did. The idea that it had different data types with operations that operated on those data types, that's interesting too. Of course, this model of designation, which they've already talked about.

"The conception of list processing as an abstraction created a new world in which designation and dynamic symbolic structure were the defining characteristics. We come now to the evidence for the hypothesis, that physical symbol systems are capable of intelligent action, and that general intelligent action calls for a physical symbol system.

"The hypothesis is an empirical generalization and not a theorem." He said that so many times. "Our Central aim however is not to review the evidence in detail, but to use the example before us to illustrate the proposition that computer science is a field of empirical inquiry, hence we will only indicate what kinds of evidence there is and the general nature of the testing process."

I said before, that this lecture is like the best short summary of artificial intelligence, this paradigm of artificial intelligence, that I've ever read.

That's where it starts, right here, is it's going to be a lot of AI from now on. It's interesting because I feel like there's quite a lot more to say on his main thesis. They seem to indicate that this is necessary. Although I think it's very, very interesting, I don't think it's such a great support for what their main thesis is, that computer science is empirical.

I do think that they just wanted to show their work. They wanted to illustrate with their work. They spend, I don't know, five or six pages on AI, and a lot of it is their work. "20 years of work in AI, has seen a continuous accumulation of empirical evidence of two main varieties. The first addresses itself to the sufficiency of physical symbol systems for producing intelligence.

"The second kind of evidence addresses itself to the necessity of having a physical symbol system, wherever intelligence is exhibited. The first is generally called artificial intelligence, the second is research in cognitive psychology." He's dividing their work into two fields. One is more, how do we build these systems that are intelligent? How do we make our chess program better, that kind of thing?

The other is research into humans. How do they think, what kinds of models can we develop, that kind of thing. "The basic paradigm for the initial testing of the germ theory of disease was identify a disease, then look for the germ. An analogous paradigm has inspired much of the research in artificial intelligence.

"Identify a task domain calling for intelligence, then construct a program for a digital computer that can handle tasks in that domain.

"The easy and well-structured tasks were looked at first. Puzzles and games, operations, research problems of scheduling and allocating resources, simple induction tasks. Scores, if not, hundreds of programs of these kinds have by now been constructed, each capable of some measure of intelligent action in the appropriate domain."

This is an interesting analogy he's making that if you had a disease, look for the germ. This is more like, "If you have a problem that humans can solve, try to solve it with a computer." [laughs]

Then, of course, symbol system. This is, again, the first kind is addressing the sufficiency. Can a physical symbol system exhibit intelligent? That's sufficiency. Is it sufficient to exhibit intelligent? Then the second part, which is cognitive psychology, is the necessity. We'll look at that. He hasn't gotten to that yet.

"From the original tasks, research has extended to build systems that handle and understand natural language in a variety of ways, systems for interpreting visual scenes, systems for hand-eye coordination, systems that design, systems that write computer programs, systems for speech understanding. The list, if not is, if not endless, at least very long.

"If there are limits beyond which the hypothesis will not carry us, they have not yet become apparent. Up to the present, the rate of progress has been governed mainly by the rather modest quantity of scientific resources that have been applied, and the inevitable requirement of a substantial system building effort for each new major undertaking."

[pause]

Eric: He's just saying that it's gotten more complicated, there's a long list of programs that do somewhat intelligent stuff. Of course, we know in the future that these things are still hard. [laughs] Interpreting visual scenes is not a solved problem, hand-eye coordination, designing, writing computer programs.

These are all things that we still find are not easy to write. "They haven't been solved, but perhaps they did find little pieces that made some sense. There has been great interest in searching for mechanisms possessed of generality and for common programs performing a variety of tasks.

"This search carries the theory to a more complete characterization of the kind of symbol systems that are effective in artificial intelligence." After writing all these programs, you start seeing some patterns, right? You want to find the pieces and parts that you can put together and reuse.

"The search for generality spawned a series of programs designed to separate out general problem-solving mechanisms from the requirements of particular task domains.

"The General Problem Solver was perhaps the first of these, while among its descendants are such contemporary systems as Planner and Conniver. More and more, it becomes possible to assemble large intelligent systems in a modular way from such basic components."

Sometimes I think that with the kinds of compute resources that we have available today, if we were to actually go back and rewrite these original systems, we might actually get a lot more out of them.

I wonder, though. I wonder if that isn't like what graduate students do these days. Fire up an easy to cluster and run like a modern version of General Problem Solver on it. I think a lot of what they learned was that.

Knowledge was one of the big constraints. You would look at a thing and it would get stuck and you'd say, "Why is this not able to solve this problem?" And it turned out that, "Oh, the system needs to learn that. The system didn't know that a whale was a mammal."

We need to write that down and even then it runs a little longer, then like, "Oh, it needs to know the density of water at sea level." Let's write that in there.

"Oh, it needs to know that humans can't breathe under water." Let's write that. It becomes that we know billions of facts about the world, if you want to call them facts.

Then, once you try to solve problems, it's not really the logic [laughs] and the reasoning. It's you don't know enough, your AI doesn't know enough. There's actually a cool project called cyc, C-Y-C, that has the goal of generating and creating a database of all these facts.

"If the first burst of research simulated by germ theory consisted largely in finding the germ to go with each disease, subsequent effort turn to learning what a germ was. In artificial intelligence, an initial burst of activity aimed at building intelligent programs for a wide variety of almost randomly selected tasks, is giving way to research aimed at understanding the common mechanisms of such systems."

Looking from 40, 46 years in the future, I don't know if we got much farther. Sorry to say.

Now, he's talking about the other side of this, the part two, which is talking about whether it's necessary to have a symbol system, and using humans as the subject of study human minds. "The results of efforts to model human behavior with symbol systems are evidence for the hypothesis and search in artificial intelligence, collaborates with research and information processing psychology.

"Explanations of man's intelligent behavior in terms of symbol systems has had success over the past 20 years, to the point where information-processing theory is the leading contemporary point of view in cognitive psychology." These are broad statements but pretty strong.

The last point is that the theory of information processing is the leading view in cognitive psychology. That's fascinating. I don't know enough about cognitive psychology to be able to evaluate that. This idea that the model of computer is information processing. That's very clear, that means it's basically self-evident.

That influencing how we understand the brain and how we understand, say, the human senses and stuff as information processing is an interesting development. Another way that the computer is influencing other sciences.

"Research and information-processing psychology involves two main kinds of empirical activity. The first is the conduct of observations and experiments on human behavior in tasks requiring intelligence. The second is the programming of symbol systems to model the observed human behavior.

"The psychological observations lead to hypotheses about the symbolic processes the subjects are using, and these go into the construction of the programs. Thus, many of the ideas for the basic mechanisms of GPS were derived from careful analysis of the protocols that human subjects produced while thinking aloud during the performance of a problem-solving task."

Too many words, man. Too many words. This, I feel is a little weak that he's referring again to GPS. What's weak about it is he clearly divided the hypothesis in two, and said that AI was working on one side of the hypothesis and cognitive psychology was working on the other half.

Now, he's saying that cognitive psychology was influencing the other half, and I just lose like well...Is it really divided? I feel like it might be better to not have divided them up. I don't know if that's a meta commentary or what about his argument, but I think what's more interesting is the back and forth, right? The back and forth between cognitive psychology and artificial intelligence.

Cognitive psychologists learn something about how humans think than the AI folks write it into their programs. That leads to a better program and then that better program, you generalize it a little bit and you can tell the cognitive psychologists, "Hey, look for something like this because that seems to work really well in our programs," and then maybe they find it.

I think that's much more interesting than like the split. OK, the absence of specific competing hypotheses as to how intelligent activity might be accomplished...Oh, sorry, this is one of those ones where I skipped a lot of words. He's talking about other evidence. It's negative evidence.

"The absence of specific competing hypotheses as to how intelligent activity might be accomplished. There is a continuum of theories usually labeled behaviorism to those usually labeled Gestalt theory. Neither stands as a real competitor to the symbol system hypothesis. Neither behaviorism nor Gestalt theory has demonstrated that the explanatory mechanisms account for intelligent behavior and complex tasks.

"Neither theory has anything like the specificity of artificial programs." Basically, he's saying, you can't translate a behaviorist model of the mind into something they can run on a computer. [laughs] To even conceive of what that would look like is making me chuckle, because behaviorist treats the organism as a black box.

They're super into the idea that, well, we can't really know what's happening in your head. It's only self-reported. We're just going to pretend like it doesn't exist or even postulate that it doesn't exist, and just look at the inputs and the outputs.

Like, we are giving you a question. You answer the question right. We reward you. If you answered a question wrong, we shock you or something, punish you. Of course, yeah, but how does it happen? How do you actually come up with the right answer? It's not even on the table. I think he's absolutely right there.

"Knowing that physical symbol systems..." Oh, he's gone on to second part. I should have said that. This is Part 2, heuristic search. The last section was all about physical symbol systems and this hypothesis that it was that symbol systems are sufficient and necessary for intelligence. This is heuristic search.

He's got a different hypothesis. They have a different numbers. "Knowing that physical symbol systems provide the matrix for intelligent action does not tell us how they accomplished this. One second example of a law of qualitative structure and computer science addresses this latter question, asserting that symbol system solve problems by using the processes of heuristic search.

"This generalization, like the previous one, rests on empirical evidence, and has not been derived formally from other premises."

He keeps saying that. I think that that's part of his point. It wasn't like Euclid coming up with the definition of point in line and then deriving the rest of stuff from that. This is actually, we made a program, we ran it, we measured how good it was at chess, and then we did it again, and we did it again, and we've learned stuff from it.

This heuristic stuff, this is what he won the Nobel Prize for, that people use heuristics in their economic activity. They don't optimize or maximize. They satisfice, meaning they have some rule that they follow or a set of rules, and they pick the one that satisfies the need that they have. They're not really trying to maximize all the time.

In fact, you can't, because to maximize you'd have to be able to try out too many options, at least in your head. Simulate too many options and you would just have analysis paralysis. You would never get to actually take action.

Another thing I want to say, this is more of a personal thing, I was working as a computer scientist. A grad student and working in a lab. I was able to generate a new experiment basically every day. I'd write some programs, some code, usually just modifying an existing one during the day, I'd run it at night, and then in the morning, I'd look at the results.

I've experienced this firsthand that you do feel like you are learning, and you're able to formulate hypotheses and falsify the hypothesis by the next day. It's very, very fast compared to other sciences. My wife however was a biologist, and she was generating one data point every few weeks sometimes.

We would talk about how discouraging it was to have so little data after working for so long, whereas I was just making up data every day and throwing it away because it was not...It's like, "No, that's not the answer."

We're in a lucky situation where we have a system where we can actually operate so quickly because...Really because the system is so malleable, and the nature of the scientific discovery process is just different. You're not trying to characterize.

What made her work difficult was that it was a cell from a certain part of the brain, and so she had to get at the cell and get a good reading. It was hard, just very physically demanding to work these little scaled-down probes. It's really hard work. Man, computers, you just fire up a hundred machines if you need them these days.

Here's the heuristic search hypothesis. "The solutions to problems are represented as symbol structures, a physical symbol system exercises its intelligence in problem-solving by search -- that is, by generating and progressively modifying symbol structures until it produces a solution structure." I'm going to go through that again. "The solutions to problems are represented as symbol structures."

Remember, these are just expressions as we know them in LISP. "A physical symbol system exercises its intelligence and problem solving by search -- that is, by generating and progressively modifying symbol structures until it produces a solution structure."

Search, it's looking for the answer by taking a simple structure and generating new ones that are perhaps better than the old one, hopefully, better. "Physical symbol systems must use a heuristic search to solve problems because such systems have limited processing resources.

"Computing resources are scarce relative to the complexity of the situations with which they are confronted. The restriction will not exclude any real symbol systems in computer or human in the context of real tasks."

This is just a summary of what he won the Nobel Prize for. What he's basically saying is, any physical system is limited. Like, in the Turing machine, it had an infinite tape and you could give it infinite time to find the answer to write all the digits of something.

He's trying to say, no, it needs to be practically limited, and it doesn't matter where you put the limit, but it has to, and it's always going to be more limited than the situation in which the system finds itself.

"Since ability to solve problems is generally taken as a prime indicator that the system has intelligence, it is natural that much of the history of artificial intelligence is taken up with attempts to build and understand problem-solving systems." Makes sense.

Now, he's going to talk about problem-solving. "To state a problem is to designate one, a test for a class of symbol structures -- solutions of the problem, and two, a generator of symbol structures -- potential solutions. To solve a problem is to generate a structure using two that satisfies the test of one."

You have two pieces, a generator that's generating potential solutions, and you have a checker that is checking whether those potential solutions are actual solutions. It's a test. I called it a checker, it's a test. "A symbol system can state and solve problems because it can generate and test."

This is more structural hypothesis. When a general intelligence system, when a physical symbol system is solving problems, it must be doing something of generate and test.

"A simple test exists for noticing winning positions in chess. Good moves in chess are sought by generating various alternatives and painstakingly evaluating them with the use of approximate measures that indicate that a particular line of play is on the route to a winning position."

You're generating, let's say all possible moves. Can this guy move here? Yes. Is that a good move? And then you evaluate whether that's going to lead to a win.

"Before there can be a move generator for a problem, there must be a problem space. A space of symbol structures in which problem situations including the initial and goal situations can be represented. How they synthesize a problem space and move generators appropriate to that situation is a question that is still on the frontier of artificial intelligence research."

Framing the problem. Usually, the problem is framed by the writers of the program, and for a long time that was what you did as an artificial intelligence programmer. You're like, "What if we represented the problem this way?" And you got one percent better results.

"During the first decade or so of artificial intelligence research, the study of problem-solving was almost synonymous with the study of search processing. Consider a set of symbol structures, some small subset of which are solutions to a given problem. Suppose further, that the solutions are distributed randomly through the entire set."

You have this big space and some of them randomly distributed are the solutions to the problem. "No information exists that would enable any search generator to perform better than a random search, then no symbol system could exhibit more intelligence than any other in solving the problem." This is if it's random, right? If it's random, all you can do is just start looking, right?

Just pick up the first one, pick up the second one, pick up the third one it could be anywhere. Why does it matter which way you do it? "A condition, then, for intelligence is that the distribution of solutions be not entirely random, that the space of symbol structures exhibit at least some degree of order and pattern.

"A second condition is that pattern in the space of symbol structures be more or less detectable. A third condition is that the generator be able to behave differentially, depending on what pattern it detected. There must be information in the problem space and the symbol system must be capable of extracting and using it."

I want to summarize these again. He kind of already summarized it, but he's saying that to be able to respond intelligently, the solutions can't be random in there. There has to be some pattern and you have to be able to detect that pattern.

Your generator shouldn't just be generating like zero and then one and then two. It's got to generate something better than that because the structure is in there, and then it has to be able to...It has to be able to generate different patterns so let me say it again.

Can't be random. You have to be able to detect the pattern, and then you have to be able to act on that pattern. Generate different solutions, different potential solutions depending on the pattern you see.

Here, he's going to give an example. "Consider the problem of solving a simple algebraic equation. AX + B equals CX + D. One could use a generator that would produce numbers which could then be tested by substituting in the equation. We would not call this an intelligent generator."

We just generate all the numbers between negative a million and a positive a million. You just tested them all for...Replaced X with that number. Does it satisfy the equation? Then that would not be intelligent. It's basically just as good as randomly trying them, so it's brute force.

"Alternatively, one could use generators that would use the fact that the original equation can be modified by adding or subtracting equal quantities from both sides, or multiplying or dividing both sides by the same quantity without changing its solutions.

"We can obtain even more information to guide the generator by comparing the original expression with the form of the solution and making precisely those changes in the equation that leave its solution unchanged."

You can be smart about this generator. This generator can know something about the operations of addition, subtraction, multiplication, and division and know how you can manipulate this equation so that it doesn't change the solutions so the equation still holds. You subtract B from both sides, stuff like that.

Now, that's one part. The second part is we know what the answer needs to look like. It has to be X equals some number, right? Or something. We want X on one side of the equal sign by itself.

We can compare the solutions we're generating and generate solutions that are leading in that direction. Have we moved stuff that's not X onto this side and stuff that is X onto that side, so you can kind of have some distance calculation. How far are we from a solution and are we getting closer?

"First, each successive expression is not generated independently, but is produced by modifying one produced previously. Second, the modifications are not haphazard, but depend on two kinds of information. Information that is constant over this whole class of algebra problems and information that changes at each step."

The information that's constant is how the operations of our work algebraically, and then differences that remain between the current expression and the desired expression so that changes at each step.

"In effect, the generator incorporates some of the tests, the solution must satisfy so that expressions that don't meet these tests will never be generated." Instead of testing brute-force, we're limiting what we generate to only things that get us closer. Now, we're talking about search trees.

"The simple algebra problem may seem an unusual example of search. We're more accustomed to thinking of problem-solving search as generating lushly branching trees of partial solution possibilities, which may grow to thousands, millions..." Nowadays billions, so the thing is the tree of the algebra problem does not branch. [laughs]

You can always know what to do next. It's clear. You just subtract some stuff out first, and then you divide some stuff, and you're done. You're always just getting a little bit closer with each move. If you're looking at chess, or more complex problems than just algebra problems, you've got branching. Now, what do you do?

"One line of research into game playing programs has been centrally concerned with improving the representation of the chessboard, and the processes for making moves on it, so as to speed up search and make it possible to search larger trees. On the other hand, there is good empirical evidence that the strongest human players seldom explore trees of more than 100 branches.

"This economy has achieved, not so much by searching less deeply than do chess-playing programs, but by branching very sparsely and selectively at each node. This is only possible by having more of the selectivity built into the generator itself."

Now notice, this is just like they said in the biography. These two researchers are more concerned with making an intelligence system that acts in the same way a human would act. They're not talking about, how can we make it search better just to make it better.

They're saying, "We studying masters, chess masters, and they're not branching as much as we're branching. They must have a generator that's smarter than ours, so that's the direction we're going to go." They would much rather do that than come up with some more efficient way of testing and just brute force generate more moves.

The somewhat paradoxical-sounding conclusion is that search is a fundamental aspect of a symbol system's intelligence, but that amount of search is not a measure of the amount of intelligence being exhibited. [laughs] [indecipherable 98:18] pretty funny thing to say. It's not about like, "We need more search. The more search, the smarter the system."

As people get better at a task, like chess...As people get better at a task, they actually search less. That's part of the intelligence, that they recognize areas of the board that they don't even need to look at, or they see that it must be one of these 5 moves, not one of the 20 possible moves. They see that instantly. It reminds me of how the AlphaGo program works.

AlphaGo has two parts. It has the generator, and it has the tester. The generator is itself a deep neural net. It has been trained on lots and lots of games to recognize potential good moves. It will generate what it considers the best moves first. This is just a generator. It's just looking at the board and saying, "Maybe here. Maybe here. Maybe here. Maybe here."

Then there's a test, which looks deeply into down the tree of possible moves. If I move here, then they're going to move here. It looks down that. It does some kind of sampling. It doesn't go all the way to a win. It does sampling to limit the amount of search that has to be done. That's how it works.

Here's the thing. I've heard people say that just the generator, without refining them with a score of, how likely are we to win if we go down this branch. Just the generator is a really high-level player. It would be an amateur. If you just took the move, they said, well, that's the best one that I can see. Boom. It doesn't even look ahead. [laughs]

It just says, probably there, move there. It's not looking, it's not analyzing deeper than that. It's just looked at the board, I think there. That player would beat you, unless you're a master. I don't know who you are, but beat me, and I know how to play Go.

It's not like, it's beating me because I don't know the rules or something like that. I know the rules. Not very good, but it would beat me. I find that really amazing. That the generator has been turned really smart. Yeah.

Then there's the test which is also pretty cool. That's sampling to see, how likely are we to win if we go down this branch? It's not actually looking at all the possibilities. It's just sampling it. When the symbolic system knows enough about what to do, it's simply proceeds directly toward its goal.

Whenever its knowledge becomes inadequate, it must go through large amounts of search before it finds its way again. That's really interesting. I think that is very human. When we don't know much about a domain, we have to do a lot of trial, and error and a lot of search, basically. The more you learn, the more direct you go right to the answer.

The potential for exponential explosion warns us against depending on the brute force of computers, as a compensation for the ignorance and unselectivity of their generators. The hope is still ignited in some human brains that a computer can be found that is fast enough and that can be programmed cleverly enough to play good chess by brute force search.

This makes me think a lot about AlphaGo again. I don't remember, but the cost of just training AlphaGo, so not all the experiments that led up to how it works and all the programming time and all that, but just the training like, "Let's train this thing on like billions of games," costs millions of dollars.

At some stage we are doing what he said, right? Like training a neural net can also be model, as search. We're deferring the problem, right? The learning part is now, this brute force, basically, search and we have the resources to do it. That scares me. It scares me.

I often think that the way we're doing neural net training these days, with the amount of data points, the amount of processing time that they require, is more like rerunning. It's more on the scale of evolution than it is on learning within your lifetime.

It's like rerunning, recapitulating, like the development of the visual cortex to do vision, right? It's not saying, "Well, we have a visual cortex. We know how to make one of those. Now, let's teach it to recognize cats."

It's saying, "Let's just start with nothing. Just big neural net, and show it millions and billions of cats and other stuff, and it'll distinguish between them. The weights of those neural nets will determine the structure that is needed to do that."

That's why it takes so much time to train these things, and so many data points. It's like running through the whole Cambrian explosion. This is a metaphor, of course. It's not actually recapitulating those things.

The Cambrian explosion, a lot of it was, because the development of the eye. Now you could see, and you could see the difference between light and dark. A little patch that's light sensitive. Then stuff could start, like predators could see what they were trying to eat, so then you had to learn to see predators to get away from them.

Boom! This big explosion of different strategies for survival. That's what I see. That's why it takes that many resources and that much energy. If you just look at the energy expenditure. Of course, it's much more directed, you don't have to worry about eating, you don't have to worry about finding a mate, all that stuff is gone.

It's just play Go. [laughs] Just recognize cats. It's still on that same order of magnitude, amount of culling of the neural net shaping of it.

Of course, there's reuse of neural nets. You can train a neural net to do one task like let's say, a visual task, and then kind of chop off the last two layers and put a new two layers on it. It's already got most of the visual cortex in there, so you don't have to redo that again.

Still, it seems like we're not doing this the way he was talking about where we're looking at how people see and building that into our system. Well, I won't comment on that anymore.

He's talking a lot about AI and I have a master's degree in AI so I have opinions on this stuff. "The task of intelligence is to avert the ever-present threat of the exponential explosion of search.

"The first route is to build selectivity into the generator. The usual consequences to decrease the rate of branching not prevented entirely. Ultimate exponential explosion is not avoided, but only postponed. Hence, an intelligent system generally needs to supplement the selectivity of its solution generator with other information using techniques to guide search."

You can't prevent exponential explosion, certainly not generally, and so we need to have more information using techniques. We got to guide the search. What does that mean? Which path do we go down?

We got a branch here. 10 different things. You got to choose which one to go down. "20 years of experience with managing tree search in a variety of tasks environments has produced a small kit of general techniques, which is part of the equipment of every researcher in artificial intelligence today.

"In serial heuristic search, the basic question always is, what shall be done next? In tree search that question in turn has two components. One, from what node in the tree shall we search next? Two, what direction shall we take from that node?" I think that's pretty self-explanatory.

"The techniques we have been discussing are dedicated to the control of exponential expansion rather than its prevention. For this reason, they have been properly called weak methods. It is instructive to contrast a highly structured situation which can be formulated say as a linear programming problem.

"In solving linear programming problems, a substantial amount of computation may be required, but the search does not branch." He just wants to say that it's not really about the amount of computation. It's more about this branching.

He talks about some other...What do you call it? Other approaches besides this generate and test approach, but I'm not going to go down there. "New directions for improving the problem-solving capabilities of symbol systems can be equated with new ways of extracting and using information. At least three such ways can be identified." He's talking about future work. Like, where do we go from here?

I do want to say I think it's clear now by how much I've read about AI that this is a pretty good summary of AI, at least, the paradigm where it was about search, problem representation, generate solutions in the problem space, test whether they're a good solution. Having to search down that...How do you generate better solutions is all about search.

I think they've lost the thread of science at this point. [laughs] They're just summarizing AI. Now, they're going into future work in AI. Again, that strengthens what I'm saying. This is good evidence for that.

"New directions for improving the problem-solving capabilities of symbol systems can be equated with new ways of extracting and using information. At least three such ways can be identified.

"First, it has been noted by several investigators that information gathered in the course of tree search is usually only used locally to help make decisions at the specific node where the information was generated.

"Information about a chest position is usually used to evaluate just that position, not to evaluate other positions that may contain many of the same features. Hence, the same facts have to be rediscovered repeatedly at different nodes of the search tree."

A simple example is sometimes you have two moves that if you do them in different orders, they'll produce the same...If you do both moves, you move the knight...Well, let's make it easy. First, you move this pawn. They move their pawn. Then you move this pawn. They move their pawn. The board doesn't matter which order you move the pawns in. They're the same end state, but the way you generate the game tree, those are two different branches.

The fact that the board looks the same at those two different branches, they're not using that information. We've already looked at this board, but it was in that branch. It's hard to do.

A few exploratory efforts have been made to transport information from its context of origin to other appropriate contexts. If a weakness in a chess position can be traced back to the move that made it, then the same weakness can be expected in other positions descended from the same move.

"Another possibility. A second act of possibility for raising intelligence is to supply the symbol system with a rich body of semantic information about the task domain it is dealing with. For example, empirical research on the skill of chess masters shows that a major source of the master's skill is to recognize a large number of features on a chessboard.

"The master proposes actions appropriate to the features recognized." Just having this encyclopedic knowledge of the features like, "Oh, this shape here. I've got to move my knight to block this..." They're thinking at this high level. They're not thinking at the low level that usually the chess programs are thinking.

"The possibility of substituting recognition for search arises because a rare pattern can contain enormous amounts of information. When that structure is irregular and not subject to simple mathematical description, the knowledge of a large number of relevant patterns may be the key to intelligent behavior.

"Whether this is so, in any particular task domain is a question more easily settled by empirical investigation than by theory." This goes back to the ghost stuff where neural nets are really good at detecting these patterns and learning these patterns, so that could be more know what is talking about. Why go programs can work now?

"A third line of inquiry is concerned with the possibility that search can be reduced or avoided by selecting an appropriate problem space." This is interesting. Some problems can be solved by using a better representation, and in that new representation, the solution is way more obvious.

Maybe you could say it requires less branching. If you try to do it in the regular representation, you have a hard problem, but this other one is much easier. He gives an example where you have a checkerboard. You have 32 tiles that are a one by two rectangle. Each tile can cover two squares, right?

You have 32 of these, right? You can cover the whole board with these 32 tiles, right? They each cover two squares. There's 64 squares on a checkerboard. All right. Now, here's the problem. If you remove opposite corners of the checkerboard, can you say removed two squares?

Can you now tile it with 31 of these tiles? Can you cover it? Well, you could brute-force it, and try every single possible combination and see if you can do it, but you could also change the representation. You could say, "Wait a second. Each tile covers both a black and a white square."

There's no way to...because it's a checkerboard pattern, you don't have two black squares next to each other or two white squares next to each other. Every one by two tile has to cover a black and a white. If we remove the two opposite corners, those are always the same color. We remove two whites or we remove two blacks.

There's no way, now, because we have...If we remove the two white corners, then there's two blacks that don't have a white with them, so we can't put the tiles on. Wow, we solved the problem. We didn't have to brute-force the search. How do you find that representation?

We've avoided all this search by changing the representation. Now, he says, "Perhaps, however, in posing this problem, we are not escaping from search processes. We have simply displaced the search from a space of possible problem solutions to a space of possible representations. This is largely unexplored territory in the domain of problem-solving research."

And I said before that often in AI, it's still the human who represents the problem. Even in neural nets, you come up with this space of vectors that's supposed to represent all possible solutions. I'm going to end now by reading what I read at the beginning. Maybe it will make more sense. A.M. Turing concluded his famous paper on Computing Machinery and Intelligence with the words, "We can only see a short distance ahead, but we can see plenty there that needs to be done."

Many of the things Turing saw in 1950 that needed to be done, have been done, but the agenda is as full as ever. Perhaps, we read too much into his simple statement above, but we like to think that in it, Turing recognized the fundamental truth that all computer scientists instinctively know.

For all physical symbol systems, condemned as we are to serial search of the problem environment, the critical question is always, what to do next?

Quite a paper, well worth a few reads. It's two of the greats that, unfortunately, are easy to forget about and overlook, but, man, their work was really important for artificial intelligence.

I don't know if I've stated this before sometime on the podcast, but I feel like the artificial intelligence project, the original one, to create machines that can do things that normal humans can do, that was very important in computer science. It's often pooh-poohed as like, "Oh, what has AI ever done?"

We have to remember that even stuff like compiling a program, parsing, and compiling a Fortran program was considered artificial intelligence. It was considered automatic programming, and so we have to give them credit.

What happens is they generate all this cool stuff on the way to try to make a better chess program, and then it becomes more widely practical and leaked out into the rest of computer science. This is the joke, once it works, they stop calling it AI, and it's just computer science.

Look at the list of stuff that IPL had, data structures. It had data types and their operations. It had dynamic memory allocation. This is stuff that we take for granted these days, but they were part of the path for getting to artificial intelligence. Without that path, feel like the situation of computing today would be a lot poorer than what we have now.

Thank you, Herb Simon and Allen Newell. I do want to recommend Herb Simon's book, it's called, Sciences of the Artificial. It goes much deeper than this short lecture did. That's all I have to say.

Thank you for listening, and as always, rock on.