What are race conditions?

This is an episode of Thoughts on Functional Programming, a podcast by Eric Normand.

Subscribe: RSSApple PodcastsGoogle PlayOvercast

What is a race condition? We look at what causes race conditions and some ways you can avoid them.

Transcript

Eric Normand: What is a race condition? Hi, my name is Eric Normand. These are my thoughts on Functional Programming. You should subscribe so you can get every episode as it comes out.

Race conditions are very interesting and fun phenomena that happened when we have multiple threads, or even multiple machines communicating with each other, and coordinating to get a job done. Let's talk about the multiple thread situation at first to make it simpler.

Let's say, I have a system that counts tomatoes. There are two cameras taking pictures of my tomato farm. I have a thread for each camera that is running some computer vision system to count the tomatoes. Every time the thread finds a tomato, it will execute the ++ operator.

If you're not familiar with the ++ operator, it increments a variable. These two threads are sharing the same global variable. Let's call it, tomato count. They're both executing the same code. They're executing tomato count ++. ++, the increment operator does three separate operations. It reads in the value from the global variable.

It adds one to it. It stores that value back in the variable. Because there's three different operations, and this thread is doing three different operations, they can interleave in different ways. The easy case is when, say, thread A does the whole thing before thread B into the thread A reads, adds and stores, and then thread B reads, adds, and stores.

Let's say, the variable is at 17 right now. Thread A counts one, so then at 18, and then thread B counts one, it's 19. That's the correct answer. Each thread counts a new one and we have two more than we had before. If thread B goes first, and then thread A, we get the same answer.

That's not a race condition because they get the same answers, so it's fine. The race condition comes in when the things run matters. When I say matters, it means you get a different answer. How is that possible? Because there are three different operations, they could happen at the same time. They could interleave.

Here's an example of a different answer you could get by running these things into different threads. Thread A could read the number 17, and then thread B could read the number 17, then both of them add one. Now they both have 18. They both write an 18 to that variable. The answer after both of them counting is only one more.

We've lost a tomato. That's a race condition because it's like a race, who gets there first, what order these things happen in. It all just depends on luck. If you're counting a lot of tomatoes, it's probably going to happen, that the two threads are going to hit at the same time.

If you've got multiple threads, if you've got 10 threads counting, it's even more likely that they're going to occur at the same time. What's unfortunate is that, very often, when you're testing, you're testing fewer threads and not as many iterations. You might not even see this happens in your tests or maybe you see it very occasionally.

It's so uncommon that you just rerun the tests and they pass now, and you ignore it. It is one of the unfortunate things about moving from a sequential execution model to a parallel execution model. A single sequential program can do this, can do that ++ operator with no problem.

There was no problem at all, but once you move into a parallel world with multiple threads sharing resources, you start to get race conditions. There's something you have to think about and avoid. I said before that this can also happen if you have multiple machines communicating. Very likely, you do already have multiple machines.

You have probably a web browser pointed at your web server, multiple web browsers at the same time, all sending you web requests. That's already multiple machines. How could this cause a race condition? You could have the same thing.

If you have a button on your site or, let's say, an old school hit counter, visit counter, it's a little image, that every time someone requests that image from your server, it increments it by one, and then serves you the new image of...it's just little number.

If you have two people make a request at the same time in your parallelized web server, then there could be a race condition between them if you're not careful. Meaning, two requests come in at the same time and you only count it as one. You lose one. What should happen is one person gets one more and the next person gets two more.

That's one and one makes two. This can happen not only in a...Wait, let me talk about how this happens on the back end. Let's say, you're storing the number of hits in the database. When a web request comes in for that image, you read in. You do a sequel query on the database.

Tell me how many hits we've had, then you add one to that, and then you update the field with that new number. Then serve the image out. If both threads or two machines or what have you, two processes, it doesn't really matter how architecturally it works as long as two things are happening at the same time.

They're both reading from the database at the same time. They get the same answer. They add one, and then they write it back. One of them is going to override the other, but they're the same number. You've lost the count. Race conditions happen because, in a sequential program, the model of time is very easy.

It's one thing happens at a time, one after the other. Once you have multiple timelines with things happening at the same time, now you've got race conditions. Those timelines can be because you have multiple threads, you've got multiple machines communicating, each one is a sequential thread and execution.

Even in JavaScript, you've got asynchronous callback chains. You have a callback for a web request, and then that does another web request and it has a callback for another web...That becomes a chain, a timeline. That chain becomes a timeline. One thing's happening after the other.

You could easily find race conditions in those if you have shared resources. That's another thing. You need shared resources to have a race condition. The two timelines need to be accessing something in common. In the first example counting tomatoes, the shared resource was that global variable.

The hit counter example, the shared resource was that value stored in the database. Those are the two conditions. Two things, two sequential timelines running simultaneously and they're sharing resources, then you have a race condition. How do you avoid race conditions?

That is a much more complicated answer, than I'm prepared to give right now because I'm already going over. I'm going to leave that for another episode of thoughts on Functional Programming. Please subscribe and you'll hear that answer when it publishes.

Also, if you have any questions, you want to get into a discussion, you want an episode about something that you're interested in, just let me know. You can reach me at eric@lispcast.com, that's my email. On Twitter, I'm @ericnormand with a D. You can also find me on LinkedIn. I'm Eric Normand.

You can search for me if you can't find my URL, Eric Normand Clojure. I should be up at the top. I'll see you next time. Bye.