Pre-West Interview: Melanie Mitchell

Talk: Visual Situation Recognition: An Integration of Deep Networks and Analogy-Making

Melanie Mitchell's keynote at Clojure/West is about building computer systems that recognize visual situations.


Melanie Mitchell is a researcher in Artificial Intelligence, complex systems, and machine learning. She did her PhD with Douglas Hofstadter, which resulted in Copycat, which was featured prominently in Hofstadter's book Fluid Concepts and Creative Analogies. Perhaps coincidentally, Copycat was written in Lisp]( The branch of Artificial Intelligence that she works in is probably different from what you would read in most texbooks or learn in college courses.

As an introduction to her work, I would recommend Complexity a Guided Tour (Youtube) for a good introduction to Complexity in general. And Using analogy to discover the meaning of images (Youtube) as an introduction to her work with analogies.

Why it matters

Alex Miller (the conference organizer) is a fan of Hofstadter's. He invited Hofstadter to keynote at Strange Loop in 2013. He also had Jeff Hawkins talk about modeling the neocortex at Strange Loop 2012.

About Melanie Mitchell

Homepage - Twitter - Author page on Amazon


Melanie Mitchell generously agreed to do an interview about her talk at Clojure/West. The background to the talk is available, if you like.

Interview Do you program in Clojure? If not, have you heard of it? If so, how did you get into it?

Melanie Mitchell: No, I don't program in Clojure, though I did write the code for my Ph.D. dissertation project in Common Lisp. I have heard of Clojure, since I'm interested in functional languages, and also some of my students have been programming in it.

LC: I'm no expert in AI, but I've always seen three big approaches. There are the two Peter Norvig approaches, namely the rule-based approach and the statistical approach. And then there was the approach laid out in Fluid Concepts and Creative Analogies, which was much more about building a system that was unpredictable but whose emergent behavior was reminiscent of human intelligence. The rule-based approach has been dismissed as too brittle. And the statistical approach has had broad impact. I have to admit that I haven't heard much about the third approach in a while. Could you talk about it briefly and some of the recent advances?

MM: You're right --- statistical machine-learning approaches have recently seen some huge successes in applications such as object-recognition, image captioning, speech-recognition, and language translation. The major ideas behind statistical machine-learning approaches, including so-called "deep networks", have been around for a while. Let's use the example of visual object-recognition. The idea is that a system is presented with many images of the concept it is supposed to learn, e.g., "dog", along with many "non-dog" images as well, and the system essentially learns how to map images into a high-dimensional "feature" space in which dog images are close to one another and non-dog images are far from dog images. The "statistical" aspect comes in because the visual features used can be statistical in nature, and also statistical methods are used to find the best mapping.

So why have these methods only been so successful recently? It seems that the success is due to the fact that we now have access to huge amounts of data to train these systems, as well as very fast parallel computing methods to perform the training. These kinds of methods have become very successful in tasks such recognizing faces and specific objects (e.g., "dog") in images.

The problem with these kinds of purely statistical methods is that, it's hard --- maybe impossible --- to learn to recognize more abstract visual situations, such as "taking a dog for a walk" or "training a dog" or "a dog show". Such situations consist of multiple objects that are related to one another in specific ways --- e.g., "taking a dog for a walk" typically consists of a person, a dog, and a leash, which have specific kinds of spatial and action relationships to one another. Moreover, there are lots of "dog-walking" situations that do not fit the prototype --- e.g., a person walking multiple dogs, a person "walking a dog while riding a bike", a person running rather than walking, etc. etc. Humans are of course very good at learning to recognize abstract situations in a flexible way --- to allow "slippages" from a prototypical situation.

The general architecture described in Fluid Concepts and Creative Analogies has been termed the "active symbol" architecture. The idea is that statistical methods interact with a network of symbolic-level structured activatable concepts. The activity of concepts in the network controls, and is controlled by, stochastic agents that bridge the gap between symbolic knowledge and statistical features. The various chapters in Fluid Concepts and Creative Analogies describe how this architecture was developed for "micro-domains", such as letter-string analogies. I'm now extending this architecture to interface with deep network approaches for the task of visual situation recognition. This is what I will describe in my talk. Is there any background material you would suggest people read/watch before the talk?

MM: I'll try to make the talk as self-contained as possible, but if people are interested, they might want to read a (somewhat long) chapter from Fluid Concepts and Creative Analogies, which I've put online. This describes the Copycat analogy-making project in some detail. Where can people follow your work online?

MM: I have a web page on this project. This is where I will be posting papers and updates on the project. If Clojure were a food, what food would it be?

MM: Cream. (Because you put it on top of Java...).