What is memoization?

This is an episode of Thoughts on Functional Programming, a podcast by Eric Normand.

Subscribe: RSSApple PodcastsGoogle PlayOvercast

Memoization is a higher order function that caches another function. It can turn some slow functions into fast ones. It saves the result of a function call after the first time to the cache, so if you call the function again with the same arguments, it will find it in the cache.

Transcript

Eric Normand: What is memoization? In this lesson, you're going to learn this simple and powerful technique, and how to implement it. My name is Eric Normand. I help people thrive with functional programming. This technique is pretty cool. It's really important for the fact that it can trade space for time. Meaning, it uses more memory to save computation time.

Sometimes, it can turn a slow function into a fast one...Sometimes, and we'll talk about when. What is it? Memoization is a cache. You can imagine you have this function that when you call it, it does all sorts of work. It takes some time. If you call it enough with the same arguments, you could say, "Maybe I should start caching my results, so I don't have to recompute the same things over and over."

The first time, you have to compute it. It's going to be slow. The second time, you can look it up in the cache. You don't have to recompute it. That's all memoization is. It's just an automatic way of doing that. Instead of using boilerplate of every time you realize you have a slow function, you write some code at the top to store the values in a cache, check the cache before you compute it.

Memoize is a higher-order function that takes your function and can do all that caching stuff for you. When do you use this? We're going to get into how to implement it in just a second, but it's really important to know when to use it because it can't just magically make everything faster. The main way to tell if you should use memoization is if you're calling the same function many times with the same arguments.

Otherwise, it's not going to help you. If you're always doing different arguments, caching it is not going to help, at all. If you're using the same arguments over and over, this could really be helpful. The first time you have to calculate it. Then every other time, it's very fast. It's just a look-up. How does it work? Let's just step through writing our own version of memoize.

You write a function and it has one argument, and that argument is another function. It's going to return a new function that is a memoized version of that function. This is a higher-order function. It takes a function as an argument. It also returns a function as an argument. Inside the function, it's going to create a little cache. That's just a hashmap, OK? It's just a hashmap. It can be mutable.

Probably should be mutable, because you're going to add to the cache. Whenever that function that is returned is called, you take the arguments and you look them up. It's like the arguments are in an array or a list or something. You look them up in the hashmap. If it's empty, if there's nothing there, you're going to have to...you have nothing in the cache for those arguments.

If there is something, you just return it. You return the value for that list of arguments. You've saved the computation. You don't have to do it again. If there's nothing there, then you call the function that you're memoizing, that argument. You call it with the arguments, you save the return value to the cache, and then you return it. Then the next time, it'll be in the cache and it'll be faster.

That's it. It's just a very simple function. It's something that you could add manually yourself to the top of every function. You could say, "Check the cache. If it's not there, then here's the logic for it." This is just a way of doing it in a reusable way. This is a higher-order thinking kind of way. let's talk again about why you would do this. It uses memory. Now you're starting to store all the memory lists in that hashmap.

Also, the return value could take a lot of memory, too. You have to keep those around, so it's going to take memory. As you run this function more and more times with different arguments, you're storing more and more stuff in that hashmap. You have to be mindful that you're not going to just save all this stuff forever. There's a pattern of usage of this technique.

Which is that you can create a locally memoized version of it. Meaning, instead of having the global function that you call be memoized globally, so everyone is using the same cache. You might want that. That might be just fine, but this other technique is in a local context. Inside of a scope, you define a new function that is the memoized version of some other function.

Because you know locally, you're going to reuse that function a lot with the same arguments and call it many times. Then at the end of the scope, it goes out of scope. The function is garbage collected. The cache is garbage collected. You've freed that space again. You have control over how things are memoized, where things are memoized, the scope of memoization.

I think that that's a useful technique to remember, that you don't have to just memoize it globally. You can memoize it at any level, because you understand how things are working. Now, you should only memoize on things that will return the same thing everytime. It's a cache. You shouldn't cache things like sending an email. I sent an email, and it returned true to tell me it worked.

If I try to send it again, it just uses the cache. No, that doesn't make sense. It's clear that it doesn't make sense. A pure function? A calculation? Yes, definitely memoizable. Some other things that are memoizable are things that are cache-able, so you could say that I'm going to do this get request. That takes a while in computation terms. It has to do a network request, has to download all these bytes.

That's expensive, in terms of time, so if I'm going to be doing a get request, I don't really want to think about how many times I'm doing it, but I only want to do it once. Just to save time. You could cache that. You could say, "I'm caching this get request." That's fine. You've memoized this function that does that get request. Sure, that makes sense, because the get method on http is defined as cache-able

They call it 'item potent.'; It means it is allowed to be cached. That's another use maybe, reading something from the hard drive. If you know it's not going to be changing because it's a slow kind of operation...Sure, cache it. Why not? One thing you could do is make a local cache for the database. So, you do some queries of the database. Inside of this one request, you make a local memoization.

Every time you do the same query, you get the same answer. It's just an easy way to cache. All right, so, let's recap. Memoization is a cache of a function's results. Same arguments, same results. Instead of calculating it a second time, you can save time and just look it up in the cache. It's useful, mostly, when you're going to be calling the same function with the same arguments, over and over again.

You pay the cost once, and now, it's all coming from the cache. It's just a higher-order function that sometimes looks up stuff in the cache, if it's there. Sometimes it calls your function and saves the value to the cache before returning it. You can do it locally, you can do it globally, whatever level of scope you need it to be at. It trades space for time, so be mindful of it using a lot of space.

Do yourself a favor. Try to implement this. Implement some kind of memoization system yourself, whether you're doing a higher-order function or, maybe, you're an object-oriented language. You can use a class with a method that takes another class, and it knows the method to call on it. Cool. I'd like to hear about your adventures doing that.

If you enjoyed this episode, you can find all the past episodes at livecast.com/podcast. There you'll find all the past episodes with audio, video, and text transcripts. You'll also find links to subscribe, and to find me on social media. For longer form discussions, email is probably better, but if you just have a quick question or something, Twitter is just great. Thank you for watching, and rock on.