Virtual Threads in Clojure

Summary: How to use Project Loom from Clojure and take advantage of millions of threads.

Introduction

What is Project Loom?

Project Loom is the codename for a project that adds virtual threads to the JVM. Virtual threads are a lightweight concurrency mechanism that allows for more concurrency than regular threads and, therefore, uses the CPUs more effectively.

The project was officially named "Virtual Threads." It was preview-released in JDK 19, with significant revisions in JDK 20 and an official release in JDK 21.

What's the difference between concurrency and parallelism?

That's a great question. Concurrency means sharing the computer's limited resources (CPU, RAM, disk, network). Parallelism means adding more of a resource so more can happen simultaneously.

concurrency: share resources

parallelism: add resources

For example, concurrency is forming a queue to share pots of coffee at the office. Parallelism is adding more pots of coffee to share. If you have three pots of coffee, only three people can pour coffee at a time--parallelism is three. However, more people can share the coffee pots using a concurrency mechanism like a queue.

Why use virtual threads?

Virtual threads let us better share the computer's resources without using an asynchronous programming style. In other words, you get better performance with easier-to-understand code. Stack traces can be a problem with asynchronous code, but stack traces from virtual threads look just like those from regular threads.

What are virtual threads?

A JVM thread is a thin wrapper around an operating system (OS) thread, which is expensive. OS threads are so costly that you can easily crash your machine if you create too many, where "too many" is in the thousands.

OS thread: a thread managed by the operating system; expensive

Virtual threads, on the other hand, are much cheaper to create and are independent of operating system threads. It means you can create millions of them.

virtual thread: a thread managed by the JVM; cheap; create them in the millions

When you run a virtual thread, it mounts onto an OS thread the JVM manages, and it runs. When a virtual thread blocks, it parks and releases the OS thread to do other work. The virtual thread will be ready to run again when it unblocks.

mount: attach to an OS thread

block: wait for I/O or other blocking operation

park: release the OS thread

In many ways, virtual threads and OS threads behave similarly. They both allow you to share resources. The main difference is their cost. Virtual threads are much cheaper.

There is another crucial difference, however. Operating system threads are pre-emptive. That means the OS will interrupt a thread running too long to give another thread a chance to run. That's how the OS achieves fair sharing.

pre-emptive: the OS will interrupt the thread to allow another thread to run

Virtual threads, on the other hand, are not pre-emptive. If a virtual thread is doing a long calculation, it will continue until it stops, never releasing the OS thread for it to do other work. A virtual thread can monopolize the OS thread it runs on. We'll talk more about how this affects how we use virtual threads in a minute. But it means your virtual thread should block frequently. virtual threads should block frequently

Requirements for virtual threads

You should use JDK 21 (or newer) for virtual threads. To get versions of the JDK, visit Adoptium.

JDK 19 and JDK 20 had preview releases of virtual threads. JDK 21 has the official release.

Simple example using virtual threads

Let's imagine we need to aggregate RSS feeds from three different sources. We could request each feed in sequence in a single thread. Here's how that would look:

(defn merge-three-requests []
  (let [feed1 (fetch-feed "http://example.com/feed1")
        feed2 (fetch-feed "http://example.com/feed2")
        feed3 (fetch-feed "http://example.com/feed3")]
    (feed-merge feed1 feed2 feed3)))

However, we'd like to do it parallel to save time and better use the CPU and I/O resources. Let's use virtual threads!

;; create an Executor that creates a new virtual thread for each task
(defonce executor (Executors/newVirtualThreadPerTaskExecutor))

(defn merge-three-requests []
  ;; execute each task in a virtual thread; .submit returns a future
  (let [feed1 (.submit executor #(fetch-feed "http://example.com/feed1"))
        feed2 (.submit executor #(fetch-feed "http://example.com/feed2"))
        feed3 (.submit executor #(fetch-feed "http://example.com/feed3"))]
    ;; wait for all three promises to finish
    (feed-merge @feed1 @feed2 @feed3)))

What shouldn't I do with virtual threads?

Virtual threads are excellent because they work much the same as OS threads. However, the few differences make all the difference in the world. Let's go over things you shouldn't do with virtual threads that are perfectly fine for OS threads.

Don't do long-running calculations

Virtual threads are non-preemptive. That means your program will not get interrupted during CPU operations like an OS thread. Don't use virtual threads if you're programming a task that primarily calculates something, hot loops, or generally runs on the CPU without blocking.

Another problem is that the Clojure refs and atoms run in a hot loop when you modify their values (using alter and swap!). Although we haven't gotten official word yet, it probably means they're unsuitable for use in virtual threads, especially with the possible contention of millions of virtual threads. This advice also applies to the lock-free classes in java.util.concurrent.atomic.

Note that agents don't have this problem because they call everything in a different OS thread pool.

Big calculation alternative

If you want to do an extensive CPU-bound calculation in parallel, use something like clojure.core.reducers. That framework gives a Clojuresque method for exploiting the ForkJoin framework. It takes excellent advantage of multiple cores.

Hot loop alternatives

  1. Try not to poll and loop. Take advantage of blocking.
  2. If you have to loop, sleep for a time (even if short) on every iteration. Sleeping will park your virtual thread, giving other virtual threads a chance at the host thread.

ref and atom alternatives

I'm still not sure what we should be using. It is probably something with locks. My current thinking is to try using concurrent collections instead of persistent collections in an atom or ref. The downside is that they are mutable so you cannot get a snapshot of the state. But for some algorithms, you might not need that.

Don't access synchronized blocks or methods

Another difference is that synchronized blocks or methods will block the underlying host thread. The JVM developers call this phenomenon pinning. Pinning with synchronized blocks seems to be a limitation of the JVM. Double-check the Javadocs for libraries you call from within virtual threads to ensure nothing is synchronized.

pinning: when a virtual thread blocks the host thread

synchronized methods and synchronized blocks pin the host thread

Clojure used synchronized blocks to implement lazy sequences and delay. In Clojure 1.12, they had to rework them to use ReentrantLock.

Use -Djdk.tracePinnedThreads=full to get a warning when a virtual thread is pinned.

synchronized block alternatives

Rewrite synchronized code to use java.util.concurrent.locks.ReentrantLock.

Don't pool them

We usually pool OS threads because they are expensive to create. But we also pool them to limit how many tasks will run simultaneously. For instance, we might create a thread pool of size five to ensure only five requests hit our database simultaneously.

thread pool alternatives

We shouldn't pool virtual threads. They are cheap to create. When you need a new task to run, create a virtual thread, let it run to completion, then let it get garbage collected.

However, we should use a semaphore to limit simultaneous resource use. Semaphores are like locks, except locks only allow one thread at a time. Semaphores allow n simultaneous threads to run. Be sure to use the true fairness setting to ensure FIFO discipline and prevent starvation.

semaphore: allow n simultaneous threads to access a resource

(ns me.ericnormand.semaphore-example
 (:import (java.util.concurrent Semaphore)))

(defonce sem (Semaphore. 5 true)) ;; limit to 5 simultaneous and FIFO discipline

(defn db-request [query]
  (.acquire sem) ;; block until available
  (try
    (do-request query)
    (finally
      (.release sem)))) ;; make available at end

What should I do with virtual threads?

We've gone over the things we shouldn't do. You can do everything else. But it's good to reaffirm what you can do.

Read/write with blocking IO

All of the java.io classes for reading files, streams, etc., should all be good.

When you call (.read is) on an InputStream and it blocks, your virtual thread will unmount and allow another virtual thread to use the host thread.

Access blocking resources in java.util.concurrent

All of the classes in java.util.concurrent that block (such as BlockingQueue) are suitable for virtual threads.

Call Thread.sleep

Thread.sleep is a static method on the Thread class. It is also good to call from a virtual thread. After the timeout, the thread will unmount from the host thread and re-mount later.

Small computations to coordinate

Although your virtual thread shouldn't hog the CPU, it can do some calculation work. For instance, you might have some code that takes from a blocking queue, looks at the message, figures out what queue to put it on next, and then puts it on that queue. That's a short-lived calculation.

core.async channels

Yes, core.async channels block with the <!! and >!! methods, so you can still use them.

manifold deferred and streams

manifold deferred and streams block as well.

Virtual threads compared to java.lang.Thread and core.async/go blocks

I used to use core.async/go blocks because they had the async nature, increasing concurrency. However, now I'm going to use virtual threads more than I used core.async/go blocks.

Virtual threadsOS threadsgo blocks
CPU-bound calculations
Synchronized blocks and methods
Blocking IO
Locks and other blocking things
Thread.sleep
Works across closures
Works with channels
Works in CLJS

How to create virtual threads

There are three ways to create virtual threads.

1. Executors

There's an Executor implementation that creates new virtual threads for each task submitted. Tasks are just functions of zero arguments. The executor returns a blocking future that will contain the return value of the task you submit.

(defonce executor (Executors/newVirtualThreadPerTaskExecutor))

;; call .submit method with a 0-argument function
(def f (.submit executor (fn [] 4)))

(type f) ;; .submit returns a future

;; get the value with deref or the .get method
;; will block until the value is ready
@f
(.get f)

2. Thread/startVirtualThread

The Thread class has a static method that starts a new virtual thread given a task. This one does not have a future. Instead, this method returns the thread itself.

(Thread/startVirtualThread #(println "Hello"))

Thread/ofVirtual builder

If you want to configure your thread, you can set the name and set the exception handler before starting it using the thread builder.

(-> (Thread/ofVirtual) (.name "My Tread") (.start #(println "Wow")))
(-> (Thread/ofVirtual) (.unstarted #(println "Wow")))

How to coordinate virtual threads

Coordinate virtual threads using any of the blocking mechanisms you would use for OS threads:

  • Queues, channels, and exchangers - for passing values between threads
  • Semaphores and locks - for coordinating access to shared resources
  • Barriers and latches - for coordinating the start and end of tasks

How to store state between virtual threads

Clojure is famous for its shared state primitives like atom and ref. We use them because they are thread-safe. However, they are typically unsuitable for virtual threads since they retry updates in a hot loop.

Instead, there are some alternatives we can use:

1. Single writers

You can use the hot-loop primitives if you have only one writer.

Here's an example. Let's say I wanted to signal to my virtual threads that they should stop looping. I could use an atom that stores a boolean.

(defonce keep-going? (atom true))
(defonce executor (Executors/newVirtualThreadPerTaskExecutor))

(dotimes [n 10]
  ;; loop with a sleep, so it's fine
  (.submit executor (fn []
                      (while @keep-going?
                        (println "Still alive!")
                        (Thread/sleep 1000)))))

;; signal to stop threads after 25 seconds
(.submit executor (fn []
                    (Thread/sleep 25000)
                    (reset! keep-going? false)))

When I want to stop the threads, I change the value of keep-going? to false. Since only one thread is writing to the atom, it's safe.

You could also use ref or volatile for single writers or for multiple writers that don't read. See examples of three use cases of volatile that apply here.

2. java.util.concurrent collections

The java.util.concurrent package has several thread-safe collections. These collections are mutable. However, they can be used to store intermediate state between virtual threads. I would use these only when I didn't need an intermediate snapshot in a consistent state. That means I would use them to collect intermediate results, wait until the end of the work, and then make a final, immutable snapshot.

Here's an example. Let's say I wanted to fetch a bunch of URLs in parallel. I could do this:

(import '(java.util.concurrent ConcurrentHashMap CountDownLatch))
(defonce executor (Executors/newVirtualThreadPerTaskExecutor))

(defn fetch-urls [urls]
  (let [results (ConcurrentHashMap.)
        latch (CountDownLatch. (count urls))]
    (doseq [url urls]
      (.submit executor (fn []
                          ;; store url/result in the map
                          (.put results url (slurp url))
                          ;; signal this thread is done
                          (.countDown latch))))
    ;; return the promise from this virtual thread, which will contain the results
    (.submit executor (fn []
                        ;; wait for all the virtual threads to finish
                        (.await latch)
                        ;; dump the results into an immutable map
                        (into {} results)))))

@(fetch-urls ["http://example.com/1", "http://example.com/2", "http://example.com/3"])

3. Not using shared state

One exciting thing I'm thinking about is whether we use shared state only because we have long-running threads. If we have short-lived threads, we can return the values from each thread and have another thread collect them. There's no reason for all threads to write to the same state. For example, I could rewrite the previous example without any shared state:

(defonce executor (Executors/newVirtualThreadPerTaskExecutor))

(defn fetch-urls [urls]
  (.submit executor (fn []
                      ;; start one virtual thread per url; be sure they start with `doall`
                      (let [futures (doall (map (fn [url] (.submit executor #(vector url (slurp url))) urls)))]
                        (into {} (map deref) futures)))))

@(fetch-urls ["http://example.com/1", "http://example.com/2", "http://example.com/3"])

Links

Here are some links that you may find useful: