PurelyFunctional.tv Newsletter 354: Tip: beware how many threads you start

Issue 354 - December 02, 2019 ยท Archives ยท Subscribe

Property-Based Testing Course Launch ๐Ÿš€

Folks, if you are reading this, there might just be time to get 50% off of the three Property-Based Testing courses before the price goes up forever.

I was planning to end the course today, but I didn't have great descriptions for the courses before (and holidays got in the way), so I've decided to extend the sale for a couple more days. Wednesday is the absolute end of this sale.

The three courses for Property-Based Testing with test.check are:

  • Beginning
  • learn to let the computer write thousands of tests for you
  • Intermediate
  • gain more confidence in stateful code
  • Advanced
  • learn to test the untestable (parallel and distributed systems)

Each course builds on the last one. Along the way, you learn to build custom generators, test at four different times in development, and how to integrate with Clojure Spec. Wednesday morning I will wake up and double the prices of these courses. Buy now or forever pay more.

Clojure Tip ๐Ÿ’ก

beware how many threads you start

It's really easy in Clojure to start new threads. The JVM uses OS threads, so each one you create is about as efficient as a thread could be. However, how many threads can your OS handle?

A common way to start too many threads is to start a new thread per server request. For example, every time an HTTP request comes in, you spawn a thread to calculate part of the answer. If you get thousands of requests in a short time period, you'll create thousands of threads all running at the same time.

Consequences

The least problematic consequence of creating too many threads is that it throws an OutOfMemory Error. This is meant to be an error that you can't recover from.

I've also had systems that did worse than this. If you continue to create more threads despite the OutOfMemory error, you can crash your entire operating system, forcing a restart. That has happened to me before, and in fact happened while I was writing this newsletter.

;; do not run this
(dotimes [_ 400]
  (let [p (promise)
        f (reduce (fn [f _]
                    (future
                      @f
                      (Thread/sleep 100) ;;; @A
                      @f))
                  p (range 100))]
    (deliver p 1)))

In the code above, we're creating a chain of futures 100 long, each waiting for the next. And we run that 400 times in a loop. Without the (Thread/sleep 100) on the line marked with @A, it runs fine and only a couple hundred threads get created. Normally, futures run in threads from a thread pool, so the threads get reused. But if they take time (like 100 ms), they can't be reused fast enough, and thousands of threads get created.

What's more, the threads are getting created inside of other futures, so the OutOfMemory error is not stopping the outer loop. Threads keep getting created, and the OS decides to clean house. Forced reboot.

I have also seen it where the threads are created but there are so many most time is spent on scheduling. As far as my code was concerned, nothing was getting done. The threads were blocking and unblocking on each other, and all of that overhead dominated the small amount of work I asked them to do.

What to use instead

Just to be clear, futures are okay to use for very short tasks and if they aren't chained together so much. If they're chained, you're blocking one thread waiting for another.

If you need longer-running things to run in parallel, you should use an ExecutorService. ExecutorServices manage a thread pool and a queue of tasks to feed to the pool. It handles problems like threads dying (by restarting them) and other things you would normally have to handle on your own.

It's beyond the scope of this email to explain them, but I have a short tutorial here.

By using a fixed size thread pool, you can guarantee that all tasks are handled as fast as they could be on your hardware without creating unbounded threads.

Awesome book ๐Ÿ“–

Elixir in Action (affiliate link)

It's important to learn other languages, and it has been a while for me since I have learned something new. So I read this book about Elixir. The book is excellent and gives a good example of what Elixir and OTP give you to make systems more reliable. I was impressed by what is offered by the Erlang VM as well as with the presentation in the book. It takes a single, simple example and through the chapters makes it more and more robust.

First Annual PurelyFunctional.tv Survey! ๐Ÿ“‹

The first annual PurelyFunctional.tv Survey is still open. Your answer to this quick survey will help me understand how to improve my videos and help you master Clojure faster and more deeply.

There are only four questions. If you've watched any of my video content, please take a few minutes to fill this out. I appreciate any answer you can give. And a big "thank you" to everyone who has already submitted an answer.

Fill out the survey

The survey will run until the end of the week.

Clojure Tool ๐Ÿ”จ

emacs-up

Emacs requires a bit of setup before it is really competitive as an editor in today's world. This starter package is minimal but gives you what you need to get started editing Clojure with Emacs. It also has nice installation instructions.

Book update ๐Ÿ“–

I have approved the proofs for Grokking Simplicity Chapter 5. Expect it out any day now!

I'm currently working on Chapter 7, which is all about Stratified Design.

You can buy the book and use the coup on code TSSIMPLICITY for 50% off.

Clojure Challenge ๐Ÿค”

Last week's challenge

The challenge in Issue 353 was to try to improve your editing experience in some way. I didn't get any submissions, but I hope people did find some improvements.

This week's challenge

Levenshtein distance

The Levenshtein distance measures the edit distance between two strings. That is, how many one-character changes do you need to make between two strings to turn one into the other. The algorithm has a nice recursive definition on Wikipedia, which makes it easy to write in Clojure.

Your goal is to implement the Levenshtein distance as a function of two strings. It should return the edit distance.

Bonus: use memoization to make it more efficient, or use an iterative method.

As usual, please reply to this email and let me know what you tried. I'll collect them up and share them in the next issue. If you don't want me to share your submission, let me know.

Rock on!
Eric Normand