Clojure Gazette 1.37
Issue 1.37 --- April 12, 2013
I really like parsing. Parsing is perhaps the best example of a declarative programming success story. We use them every day (regular expressions).
Despite their contemporary ubiquity, parsing had very humble beginnings. The original parsers were anything but declarative---written in assembly or machine code. Since we take this well-studied area for granted, I wanted to get a feeling for how undeveloped the field was until very recently, as well as a sense of the ease and variety available today.
P.S. Feel free to email me any time. I love hearing from readers.
This historic paper is the beginning of BNF. Starting on page 12, a formal specification for ALGOL was defined in order to resolve differences of interpretations among the ALGOL implementations. The basics of this format have stuck with us, and nearly every grammar is specified in a modified form of BNF.
META II is a cool little grammar language that can define itself in a tiny amount of code. Check out Long Nguyen's bootstrapped implementation .
Brian Kernighan shows how a simple regular expression matcher can be written which both easily demonstrates the idea and covers most of what regular expressions are used for.
Nate Young presents parser combinators, which are a functional way of defining parsers that take a very declarative form. He demos monadic parsing, which could be a use for monads. Finally, he presents Parsatron , his translation of Haskell's Parsec into Clojure. **
Parsley is Christophe Grand's parsing DSL. It is a great example of an internal DSL (using Clojure data structures) and has some interesting properties, including being incremental, which means that it can reparse text that has changed in a shorter time than parsing it from scratch.
I cannot resist linking to my own parsing combinator library, which is based on PEGs (more here ). It has combinators and a DSL. The DSL could use some work, but it is completely based on Clojure data structures (not strings).
The interesting thing about squarepeg is that it can parse any Clojure seq, even seqs of seqs. It was an attempt to bring the declarative grammar definitions to any kind of data.
Instaparse is a brand-new parsing system for Clojure written by Mark Engelberg. It asks the ambitious question:What if context-free grammars were as easy to use as regular expressions?
What has resulted is an easy to use and practical parsing system. Give it a look.