Clojure Regex Tutorial
Summary: With a few functions from the standard library, Clojure lets you do most of what you want with regular expressions with no muss.
Clojure regexes are host language regexes
Refer to the following documents for the regex syntax for a particular host:
And you can use Regex 101 for testing out regexes. Be sure to select the language in the menu in the top left. I also use the REPL.
Of course, this difference means that regexes are not always portable. Other than the syntax and semantics of the regexes themselves, Clojure standardizes many regex functions across all platforms in the core library.
Clojure regex syntax
You construct a regex in Clojure using a literal syntax. Strings with a hash sign in front are interpreted as regexes:
This syntax is the most convenient because you don't need to double escape your special characters. For example, if you want to represent the regex string to match a digit, using a Clojure string you would need to write this:
"\\d" ;; regex string to match one digit
Notice that you have to escape the backslash to get a literal backslash in the string. However, regex literals are smart. They don't need to double escape:
#"\d" ;; match one digit
Matching a regex to a string with groups
Very often, you want to match an entire string. The function to do that in
Clojure is called
takes a regex and a string, then returns the result of the match.
(re-matches regex string) ;;=> result
The result it returns is a little complex. There are three things it can return.
1. No match returns
If the whole string does not match,
nil, which is nice because
nil is falsey.
(re-matches #"abc" "xyz") ;;=> nil (re-matches #"abc" "zzzabcxxx") ;;=> nil (re-matches #"(a)bc" "hello, world") ;;=> nil
2. Matching with no groups returns the matched string
If the string does match, and there are no groups (parens) in the regex, then it returns the matched string.
(re-matches #"abc" "abc") ;;=> "abc" (re-matches #"\d+" "3324") ;;=> "3324"
Since all strings are truthy, you can use
re-matches as the test of a
(if (re-matches #"\d+" x) (println "x is all digits") (println "x is not all digits"))
We'll see a more convenient way to test and use the return value here.
3. Matching with groups returns a vector
If it matches and there are groups, then it returns a vector. The first element in the vector is the entire match. The remaining elements are the group matches.
(re-matches #"abc(.*)" "abcxyz") ;;=> ["abcxyz" "xyz"] (re-matches #"(a+)(b+)(\d+)" "abb234") ;;=> ["abb234" "a" "bb" "234"]
The three different return types can get tricky. However, I usually have groups,
so it's either a vector or
nil, which are easy to handle. I tend to use
if-some. It evaluates the
match, checks for
nil, and destructures the groups. You can even destructure
it before you test it.
(if-some [[whole-match first-name last-name] ;; destructuring form (re-matches #"(\w+)\s(\w+)" full-name)] (println first-name last-name) ;; matching case (println "Unparsable name")) ;; nil case
Finding a regex substring within a string with groups
Sometimes we want to find a match within a string.
re-find returns the first
match within the string. The return values are similar to
1. No match returns
(re-find #"sss" "Loch Ness") ;;=> nil
2. Match without groups returns the matched string
(re-find #"s+" "dress") ;;=> "ss"
3. Match with groups returns a vector
(re-find #"s+(.*)(s+)" "success") ;;=> ["success" "ucces" "s"]
Finding all substrings that match within a string
The last function from
clojure.core I use a lot is
re-seq returns a
lazy seq of all of the matches.**The elements of the seq are whatever type
re-find would have returned.
(re-seq #"s+" "mississippi") ;;=> ("ss" "ss") (re-seq #"[a-zA-Z](\d+)" "abc x123 b44 234") ;;=> (["x123" "123"] ["b44" "44"])
Replacing regex matches within a string
Well, matching strings is cool, but often you'd like to replace a substring that
matches with some other string.
replace all substring matches with a new string.
Do not confuse
clojure.core/replace. They are
very different. I will often alias
str in my
(ns my-app.core (:require [clojure.string :as str]))
That lets me refer to
Here's a quick example:
(str/replace "mississippi" #"i.." "obb") ;;=> "mobbobbobbi"
This example matches an i followed by any two characters. It replaces all matches with the string "obb".
Notice the argument order. The string you are matching against comes first,
followed by the regex. Most functions in
clojure.string follow that pattern.
Since the functions are about strings, the strings are the first argument.
Referring to groups in the replacement string
clojure.string/replace is actually quite versatile. You can refer directly to
the groups in the replacement string using a dollar sign.
$0 means the entire
$1 means the first group.
$2 means the second group, etc.:
(str/replace "mississippi" #"(i)" "$1$1") ;;=> "miissiissiippii"
This example doubles all of the
Calculating the replacement with a function
You can replace matches with the return value of a function applied to the match:
(str/replace "mississippi" #"(.)i(.)" (fn [[_ b a]] (str (str/upper-case b) "—" (str/upper-case a)))) ;;=> "M—SS—SS—Ppi"
You can replace just the first occurrence with
Splitting a string by a regex
Let's say you want to split a string on some character pattern, like one or more
whitespace. You can use
(str/split "This is a string that I am splitting." #"\s+") ;;=> ["This" "is" "a" "string" "that" "I" "am" "splitting."]
Again, we see the same argument pattern: The string to match comes first, since
clojure.string functions are about strings.
Creating a case insensitive regex in Clojure (and other flags)
Some languages have syntax which allow you to put modifiers on the regex, such
i modifier which makes it a case insensitive match. Here is an example
This regex will match three
j's regardless of the case.
will match. These are called flags.
Unfortunately, Clojure's syntax does not allow for flags. You have to rely on the native host mechanisms for creating regexes.
1. JVM Clojure
On the JVM, there are two ways to use flags.
JVM Regex Flags Method 1: Special flag syntax
The JVM regexes allow for a special syntax to enable flags within the regex.
;; no flags (case-sensitive) #"abc" ;;=> #"abc" ;; case-insensitive flag set #"(?i)abc" ;;=> #"(?i)abc"
These are flags that can be turned on and off along the regex. For instance:
#"ab(?i)cdef(?-i)ghi" ;;=> #"ab(?i)cdef(?-i)ghi"
The flag starts off, so
ab is case-sensitive. Then the first
(?i) turns it
cdef is case-insensitive. Then
(?-i) turns it off (due to the
ghi is case-sensitive.
You can even selectively turn them on or off in non-capturing groups:
#"ab(?iu:cdef)ghi" ;;=> #"ab(?iu:cdef)ghi"
This turns on the
u flags for just the
The JVM regex flags syntax is quite powerful, and, if I had to guess, I would say that it's the main reason setting global flags using other syntax is hard.
JVM Regex Flags Method 2: Create a regular expression by using the host classes
We will be using the
java.util.regex.Pattern class, so we should import it for
(ns my-app.core (:import (java.util.regex Pattern)))
Now we can use it to compile a regex:
;; These two are equivalent: #"abc" ;;=> #"abc" (Pattern/compile "abc") ;;=> #"abc"
To add flags, we have to refer to them by their name. It's not very convenient to type, but here it is:
(Pattern/compile "abc" Pattern/CASE_INSENSITIVE) ;;=> #"abc"
- This makes a case-insensitive regular expression.
- The regular expressions using flags like this print the same as the regexes without flags.
- The flag applies to the entire regex.
- You can find out the flags on a regex using the
- You will need to escape backslashes (
\) twice since you're using a string literal, not a regex literal.
You can combine flags using
(Pattern/compile "abc" (+ Pattern/CASE_INSENSITIVE Pattern/UNICODE_CASE)) ;;=> #"abc"
It's not convenient to type, but at least it's explicit. You can read about the available flags on the JVM.
There is a trick I've used to make escaping a little easier. You can use a regex
#""), then convert it to a string to pass it to
;; double escaped (Pattern/compile "\\d" Pattern/CASE_INSENSITIVE) ;; more ergonomic (Pattern/compile (str #"\d") Pattern/CASE_INSENSITIVE)
RegExp. If you don't need
flags, you can construct one like this:
;; These two are equivalent: #"abc" ;;=> #"abc" (js/RegExp. "abc") ;;=> #"abc"
To add flags, just add a second argument, a string containing the letter codes:
(js/RegExp. "abc" "iu") ;;=> #"abc"
Unfortunately, regexes with flags print the same as regexes without flags, so be careful.
Find whether a string contains another
I commonly use regexes to determine if a string contains another string. That's
easy to do with
(re-find #"needle" "Find a needle in a haystack.") ;;=> "needle" (re-find #"needle" "Empty haystack.") ;;=> nil
Because the return is truthy or falsey, you can use it as the condition of an
But if you're just using a substring match (and not using fancy regex features
like flags, character classes, and repetition), you can use
(str/includes? "Find a needle in a haystack." "needle") ;;=> true (str/includes? "Empty haystack." "needle") ;;=> false
Regexes are nice because you can match the beginning of the line or the end of the line:
(re-find #"^This string" "This string starts with ...") ;;=> "This string" (re-find #"end$" "Find a string at the end") ;;=> "end"
(str/starts-with? "This string starts with ..." "This string") ;;=> true (str/ends-with? "Find a string at the end" "end") ;;=> true
Remember, we commonly alias
str in the
(ns my-app.core (:require [clojure.string :as str]))
Escaping regex characters in a string
Sometimes you have a string that contains some special characters that are meaningful as part of a regex.
"(??^$]" ;; A string I want to match literally
However, if you want to match those literally, you'll be in for a world of pain.
#"\(\?\?\^\$\]" ;; you can't escape the escapes!
java.util.regex.Pattern class has a static method that's useful for
quoting such strings:
(Pattern/quote "(??^$]") ;;=> "\\Q(??^$]\\E"
You can then pass it to compile:
(-> "(??^$]" Pattern/quote Pattern/compile) ;;=> #"\Q(??^$]\E"
It lets you write this regex:
As this edn:
[:cat [:+ [:class :word ".%+-"]] "@" [:+ [:class ["A" "Z"] ["a" "z"] ["0" "9"] ".-"]] "." [:repeat [:class ["A" "Z"] ["a" "z"]] "2" "4"]]
Check out this interactive tutorial.
Other rarely-used functions
Those are all of the functions I use routinely. There are some more, which are useful when you need them.
Construct a regex from a
This one is not available in ClojureScript. On the JVM, it creates a
which is used for iterating over subsequent matches. This is not so
If you find yourself with a
Matcher, you can call
re-find on it to get the
next match (instead of the first). You can also call
re-groups from the most
recent match. You can also use a
Matcher to get named capture groups. See
Unless you need a
Matcher for some Java API, stick to
Matchers are mutable and don't work well with threads.