Reading Clojure

Preamble

Back in early December I held a Clojure workshop for the Bucharest Functional Programming meet up. Having to explain the language to developers who were completely new at it was an interesting experience. I decided to start from the ground up: how to even read the blasted thing.

Reading Clojure

I think what takes people the longest to get used to is how to read the language - particularly if they are already familiar with functional programming semantics. It took me a while to get used to it myself, and even longer to realize why that was.

When reading Java, or C#, we have these markers all over the place. I didn’t even think about them. I had gotten used to just scanning a page of code, paging down, my brain on something else, until my eyes seized. They spotted a public static and then my brain came on, checked if it was the function I cared about, and otherwise it went back into sleep mode while I paged down.

1
2
3
public static int plus_one(int v) {
return v + 1;
}

Clojure’s syntax is very compact. You don’t have these noisy markers when reading code, so you need to be paying attention. You’re going to have much less code to scan through, but your mind needs to be on.

It’s a pleasant change, but it requires practice. We learn by doing, but we also learn by reading what others have done. So I figure the best way to start is to describe how to read Clojure.

Luckily, reading Clojure is trivial. The difficult part actually comes from having to abandon bad habits and expectations of it being more complicated than it is.

Because that’s the second thing I learned. I realized that something that was making it difficult for me to write and read Clojure, was that I was attempting to superimpose on it the more baroque constructs of the languages I was used to.

So let’s dive right in.

Lists and vectors

I thought about easing this in, and talking about code organization and functional programming and what not, but this is important. This is, in fact, the most important bit here.

This is a list: (1 2 3)

This is a vector: [1 2 3]

We’ll talk about how they are different in a bit - it’s mostly about how some functions treat them. For all effects and purposes, the list acts like you would expect a list to act, the vector as an array.

This is also a list: (+ 1 2 3)

This is also a vector: [+ 1 2 3]

That’s it. That is 90% of what you need to know.

Evaluation semantics

The question then is… if this is a list:

1
(1 2 3)

And this is also a list

1
(+ 1 2 3)

Why does the latter get executed and but the second one doesn’t?

Before we go any further, take a step back. Open a text file, make some notes as to the reasons why you’d expect the second line to be an invocation.

Ready?

OK.

In fact, that’s a misconception. When Clojure encounters a non-empty list, it always assumes that they are something to evaluate (either a special form, a macro, or a function). The first value is what’s being invoked, the rest are its parameters.

So in the case of

1
(+ 1 2 3)

It will invoke the function + with the arguments 1, 2 and 3.

Period. Dead simple. That’s it. That’s pretty much all there is to reading Clojure.

As for the first example, if Clojure were to encounter (1 2 3) in the middle of a program, it’ll try to execute it, and expect 1 to be a function. That’ll throw up an error.

There’s a way to avoid that, called “quoting”. We’ll get to that later.

If you’re joining us from Java…

Now, if you are used to C# or Java or Scala, there’s a perfectly normal and reasonable reaction to this. There’s a part of your brain that’s nodding along, thinking “yeah, yeah, homoiconicity, sure“, and at the same time placing a huge asterisk around it.

That asterisk likely points to something saying “well, there’s special cases, like if we need to define a function, or ifs, or a loop. Those things have their own syntax that maybe smells like this, but is different“. Maybe that voice is fine with there being a special case, but it still thinks there is.

Let’s look at some examples to shake off this implicit assumption.

Examples

Let’s look at some cases.

or

1
(or is-right is-kinda-right always-approve)

This is trivial, right? We just apply the logical or function to everything that comes right after. Same thing as when we were adding. No surprise.

In Java we’d have the or in the middle of every value, here we just front-load it. No biggie.

That’s because in Java || is an operator, same way that + is. Java does have some things that are inherent syntax, like the if.

if

So let’s look at what an if is like in Clojure.

1
2
3
(if odd?
(do-something [1 3 5 7])
(do-another-thing [0 2 4 6]))

Aha! We can already see some different things here. Syntax, right? And in fact, if you’ve read about Clojure, you’ve seen that an if is what is called a special form.

Special forms aside

Which is true. Clojure does have something called special forms. One hears special forms and thinks “that’s the stuff that is different“.

It’s not. Not for syntax, and not on how the semantics are evaluated.

What I want you to hear when somebody says special forms is not special cases but primitives. They are things implemented directly into the language, not things that behave differently.

Keep that in mind whenever you see the term come up.

Back to decoding the if

Let’s decode that one for a bit.

1
2
3
(if odd?
(do-something [1 3 5 7])
(do-another-thing [0 2 4 6]))

First of all, there’s the question mark in odd?. That’s not special syntax, but a convention to indicate a value is a boolean. It’s good form to use it, but not necessary.

Then we have the structure that we expect. We have the thing to evaluate (odd?), followed by what to evaluate if true, and what to evaluate if false.

But this is actually not special syntax. The “branches” don’t even need to be “in parenthesis”. Remember: that’s something we do only if we have to evaluate a function. In fact, there are no “branches”, but forms to evaluate in either case, which could just as well just be values.

Therefore, this would be just as valid an if as the one above:

1
2
3
(if got-a-list?
(convert-to-string [1 3 5 7])
"Got something else")

If got-a-list? is true, then Clojure would evaluate the function of converting the list to a string, otherwise it’d just return the string value of the second case (a string evaluates to itself).

Function declaration

Once you’ve grokked that, we should be ready to look at function declaration. Language shapes the way we think, so in order to do that, we should first consider how one’s mind is likely to describe code.

How we think about functions

Let’s go back to how we declare a trivial function in Java.

1
2
3
public static int plus_one(int v) {
return v + 1;
}

Let’s go over how we’d describe what we are looking at here. This might seem like a trivial thing, but don’t skip over it.

  • We have the public keyword, telling us how available the function is,
  • We then have the static keyword, telling us if it belongs to a class or an instance,
  • Then there’s a type the function returns,
  • Then there’s the function name,
  • Then comes the parameter declaration, wrapped in parenthesis,
  • Then there are brackets wrapping a code block,
  • Then there’s the return keyword followed by what to return.

We really have no other way to do it. We need to do this descriptively, think in terms of characters and glyphs that mark what we see. The parenthesis, the brackets, they have no use other than as syntax markers.

defn, round 1

Let’s now look at how we would do this in Clojure. Say, we want to add the “hello world” of functions and re-implement inc:

1
2
(defn plus-one[i]
(+ i 1))

This doesn’t look too bad. Still, that part of your brain might think “well, that’s a special syntax that expects some things in order, like the parameters in brackets, or it gets a syntax error“.

Coming from Java, we might be tempted to read it like this:

  • We have the defn keyword,
  • Then the function name,
  • Then the parameter names in square brackets,
  • Then we just traded the brackets in the body for parenthesis.

Let’s disabuse that part of your brain from the notion right away.

This is just the exact same thing as when we were describing lists and vectors before. If we were to break this down into its elements, we would instead see it as:

1
2
3
4
5
6
(
defn
plus-one
[i]
(+ i 1)
)
  • defn is the name of what’s being invoked. Everything else after it are arguments,
  • The first argument is a symbol specifying the name that the function will be bound to (plus-one),
  • Then there’s an argument that’s a vector with the symbols for the parameter names,
  • The rest is considered implicitly as a list, and its elements will be evaluated one by one.

The result value of a function is the result value of the last item evaluated, in this case, (+ i 1).

But notice the fundamental difference there, which is easy to gloss over. The parameter names are not “in square brackets”, they are a basic data structure containing symbols. Whatever is receiving this symbol vector can manipulate it or pass it to other functions at will.

In fact, defn is not even a special form, it’s not a primitive, and it’s not an inherent part of the language. It’s a macro.

defn, round 2

How about this, then?

1
2
3
4
5
(defn get-from-string
[conn long-url]
(let [url (db/get-url conn long-url)
parsed (db/parse url)]
(clean-up parsed)))

This is, once again, the exact same thing. Let’s break it down, with some nesting to differentiate between the parameter lists and what’s being invoked.

1
2
3
4
5
6
7
8
9
10
(
defn
get-from-string
[conn long-url]
(
let
[url (db/get-url conn long-url) parsed (db/parse url)]
(clean-up parsed)
)
)

Again, we have the defn macro, with the same form as before. Then its first element is another macro, let, which receives as its first parameter a vector of symbol/value pairs.

Let this sink in for a bit. Every case of Clojure code that you encounter will follow this exact same pattern. No matter what that code is or where it comes from, you can be 100% sure that the first value will be the item being invoked, and whatever comes after are arguments.

Even if you see a known function name further down the chain… guess what? That means a function is being passed as an argument.

Tao

This is perhaps the most fundamental thing here. If you take anything from this long spiel, please let it be this bit.

This here is the tao of Clojure syntax. There truly is nothing but the list. In this case, syntax and evaluation semantics are one and the same.

There is nothing that will deviate from this. Value assignment, function invocation, type declaration… it’ll all take the same form.

Quiz!

OK, pop quiz! Are these lines valid Clojure?

1
2
(inc 1)
(inc)

Yes! Both are, as a matter of fact. They are both lists starting with a symbol of a function to invoke. Having said that, the second one will not compile, as the function inc expects some parameters.

Having established that inc is a function, how about these?

1
2
3
inc
[inc]
[inc 1]

These are syntactically valid Clojure as well.

  • inc is merely the symbol associated with the function.
  • [inc] is a vector whose first element is the function inc.
  • [inc 1] is a vector whose first element is the function inc, followed by the element 1.

If Clojure finds any of these, it won’t try to invoke them as it would with a list. It will merely evaluate the function reference, and in the case of the last two, will create a vector with these values. This is useful when you want to pass a function as a parameter to another function.

Having looked at that, would you then expect this to be syntactically valid Clojure?

1
2
3
4
5
(defn a-function [v]
[+ 1 2 3 4 5]
plus-one [inc]
[inc 1]
v)

And it is as well! We are just constructing a list of stuff, some of which are vectors and some of which are symbols. Now, is it semantically valid? As in, will it not only compile, but execute?

Yes! Clojure will go down the list, one by one, and evaluate them. The resulting value will be the last item evaluated, or in this case, v. That’s effectively a very noisy identity function.

Of course, you’re not likely to encounter a function like that. I’m only writing something so odd-looking to eliminate two preconceptions:

  • First, that everything needs to be wrapped in parenthesis. You don’t need to do that unless you’re creating a list or invoking something;
  • Second, that Clojure has all these syntactic special cases that other languages do, only done differently.

Wrapping up

Phew, that was a long read.

If you’re just coming to Clojure, I hope this has helped scrub off some deeply ingrained habits about how to read it. While I haven’t touched on many areas, like sets, keywords, hash maps or meta-data, I thought that clarifying this first was more important. We should also go into how defn is not even a primitive, and just macro combining a few primitives in a single, convenient package, but we’ll get to that later.

Now go out and start getting acquainted with Clojure!

Author

...
Ricardo J. Méndez