Tricks for Java interop
When I first started learning Java, it was very early in its evolution. It was a simple language with classes and methods. Over the years, they've added all sorts of stuff. Inner classes, enums, generics, and vararg methods. These things continuously trip me up when writing Clojure. What's cool is that while Java has added lots of features as a language, the JVM bytecodes have been very stable. These features are compiled into the same basic class/method scheme as everything else.
Luckily, Clojure gives us exactly what we need to consume classes and call methods. Over the years, I've had to deal with all of those Java features I listed above. It's not always obvious how they map. I still have to look these things up all the time. So I'm collecting these tricks here for myself and for you to refer to.
Java class and package names
Ok, this isn't really a Java language feature issue. But it's
important. Clojure allows characters in names that are not legal in
Java names. The most common problem with this is that hyphens in
Clojure names get translated into underscores. But there are many more
characters that need translating (?
, !
, etc.).
Clojure has a function called clojure.core/munge
that will
convert Clojure-style names into their equivalent Java legal name. The
reverse operation is called clojure.core/demunge
. These fns
actually wrap methods in the Clojure compiler, so they're the
same logic the compiler uses itself. It uses a lookup table to know
which characters are allowed and which need to be translated. Refer to
that table to see quickly how things will be translated.
Inner classes
If you use a lot of Java interop, you'll often find a class like
this one. It's called
java.lang.Thread.UncaughtExceptionHandler
. The package is
java.lang
, and the classname is
Thread.UncaughtExceptionHandler
. You know the Thread
part is part
of the classname because it starts with a capital T
. But now it means there's a .
in the name.
This type of class is called an inner class. It means
that the class UncaughtExceptionHandler
was defined inside of the
Thread
class. That's not a problem. The problem is that .
(dot) is
not a valid character in Java classnames! So how is this thing named?
The answer is that the Java compiler replaces .
s (dots) in the class
names with $
(dollar sign). So, when you're refering to that class,
you need to do the same translation:
(reify Thread$UncaughtExceptionHandler
...)
Enums
Java added Enums in Java 5. Enums are a fancy way to represent a choice between different constants. You get a new type (the enum) and you have premade instances of that type made for you. Of course, there's no bytecode for enums. Instances of the enum are implemented as static fields. So if you've got a Java enum like this:
enum DaysOfWeek{ MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY; }
You can refer to those enumerations in Clojure like this:
DaysOfWeek/TUESDAY
That's the standard static field access.
Varargs methods
Java 5 also added varargs. That is, methods that take
different numbers of arguments. Specifically, the last argument can
have ...
(three dots) after the type and Java will accept any number
of arguments of that type, including zero.
An example is java.util.Formatter.format()
. Here's the type
signature:
public Formatter format(String format, Object... args)
Of course, the JVM bytecode does not know anything about this. The
Java compiler has to convert that into something the JVM can
handle. What it does is package up all of the arguments into an array,
and it passes that array as the argument. So, as far as the JVM is
concerned, that format()
method just has two arguments. A String
and an array of Object
s. Note that the array might be empty.
What that means to you, as a Clojure programmer, is that you have to build that array yourself—even if it's empty.
Building an empty array is easy with
clojure.core/make-array
:
(.format formatter "No need for args" (make-array Object 0))
But if you need some stuff in there, that's easy too with
clojure.core/into-array
:
(.format formatter "%d %d %d" (into-array Object [1 2 3]))
Generics
Java 5 also introduced generics. What a release! Generics are
a static type system that lets you parameterize classes on other
classes. They're very commonly used for collections, where you want to
say you want a list of strings (written List<String>
).
But here's the thing: the type inference and checking is done entirely
at compile time. There's no representation of this in the
bytecode. That means if you need to consume these classes from
Clojure, you don't have to deal with the type in any different
way. Just consider all the types to be Object
, which is typically
how they're represented in bytecode anyway.
Now, if you're creating a library for consumption by people writing
Java, you might be used to using gen-class
. But gen-class
does not
have any way to specify the generics type signatures. If you really
want to include that information, you can write an interface or class
in Java
directly. Chas Emerick has a good answer on Stack Overflow. I have a lesson on how to include Java code in your Clojure projects.
Conclusions
I'm glad I've documented these tricks because I'm likely to encounter them again. These are frustrating quirks. But writing these down has also helped me appreciate the stability of the JVM bytecode. Java has added significant language features, but the JVM is basically the same. Amazing.
The JVM is full of these kinds of quirks. That's why I created the JVM Fundamentals for Clojure course. You can buy it to view online or to download. Or you can buy a membership and get the JVM course and all of the other courses on the site.
After talking about some quirks, I really want to explore one of the big advantages Clojure gets by running on the JVM. That thing is the highly-optimized garbage collector. Clojure generates a lot of garbage, but it's vacuumed up quickly by the JVM. That's what we'll explore next time.