ClojureScript performance revisited

Ricardo J. Mendez

Dec 14 2015

clojure, clojurescript, iot, performance, processing, quil

Introduction

At the recent Thingmonk conference in London, I got to chatting about Clojure and ClojureScript. It’s an IoT event, so it was to be expected that the topic of performance on devices like the Tessel or Raspberry Pi would come up.

That got me thinking about the piece I wrote back in January about ClojureScript performance for Processing sketches. How much has ClojureScript performance improved since?

Let’s find out.

ClojureScript improvements

Here’s a summary of the original article.

I was working with Processing sketches in ClojureScript, and decided to see how far I could push a sketch that processed a lot of agents per frame.

The sketch initializes a number of circle agents with a random radius and movement direction, and then every frame we:

Update their positions,
Evaluate if any of the circles overlaps with any other, and finally,
When two overlap, draw a circle at the overlap center, with a radius of the overlap amount.

The first thing to check was how much ClojureScript performance had improved by itself, without my tweaking the code at all. For ease of reference, these were the performance numbers back then (on frames per second for a number of agents):

Circles	CoffeeScript	ClojureScript
100	38	27
150	34	6
170	32	N/A
200	25	N/A
350	8	N/A

The N/A on the ClojureScript column is because at that agent volume, the CLJS version became a very slow slide show.

To test current performance, I upgraded to ClojureScript 1.7.170 but left Quil at 2.2.4, which is the version that was available back in January. That allows me to control for any improvements Quil may have had, even though profiling had not shown it to be the bottleneck - garbage collection and iterations were.

The tests were a pleasant surprise: I could now run it effectively all the way up to 350 circles, even if it was still slower than the CoffeeScript version across the board.

Circles	CoffeeScript	CLJS 2015-01	CLJS 2015-12
100	38	27	30
150	34	6	27
170	32	N/A	16
200	25	N/A	12
350	8	N/A	3

While slower, these numbers shown a marked improvement - for starters, we can still run it at higher agent volume! Now that things are getting closer to a fair fight, how much could we optimize it, and how much effort would it require?

ClojureScript’s handicaps

First, let’s do a quick refresher on the handicaps that ClojureScript is facing on this bout:

The compiled CoffeeScript version is much closer to what you’d write in JavaScript, since transpiling ClojureScript to JavaScript does add a certain amount of code overhead. Nothing we can do about that.
This version is fully immutable, unlike the CoffeeScript version which does get to mutate its own data. That means we’re facing a significant garbage collection hit, and profiling shows that at 200 agents over 35% of our time is going to GC.
I’m also using Quil’s functional mode middleware, which would preclude keeping our own world state in a mutable list. I can change that if push comes to shove, but I’d rather see how much I can improve it while remaining fully functional.
I am not using local transient values yet. That can be easily remedied.

On to the improvements!

For the record, I am not trying to go pedal-to-the-metal here. In fact, I am trying to improve performance while touching as little code as possible. My aim is to have the program retain its original shape.

Circle overlap iteration

Let’s look at the low-hanging fruit first. calculate-overlaps iterates over the entire circle list, calculating the overlap for each circle with the remaining ones, and returns a list of the overlap collections.

(defn calculate-overlaps [circle-list]
  (loop [circle (first circle-list)
         others (rest circle-list)
         acc []]
    (if (or (nil? circle) (empty? circle-list))
      acc
      (recur (first others)
             (rest others)
             (conj acc (calculate-circle-overlaps circle (rest others)))))))

As we can see it’s doing this with a regular vector, which we can easily turn into a transient one and persist it once we’re done.

(defn calculate-overlaps [circle-list]
  (loop [circle (first circle-list)
         others (rest circle-list)
         acc    (transient [])]
    (if (or (nil? circle) (empty? circle-list))
      (persistent! acc)
      (recur (first others)
             (rest others)
             (conj! acc (calculate-circle-overlaps circle (rest others)))))))

Done! Testing with 200 circles, this has given us a performance bump of…. drum roll… 1 frame per second.

We just went from 12 to 13 frames per second.

Clearly that’s not where our GC hit is coming from, which makes sense seeing as how we’re “only” recurring 200 times on that case.

On to the next suspect.

Overlap calculations

We then have calculate-circle-overlaps, which calculate-overlaps above calls for each circle. In its original version:

(defn calculate-circle-overlaps
  [circle circle-list]
  (->>
   (map 
     (defn f [c] 
       {:overlap-amount (circle-overlap-distance circle c) 
        :mid-point      (mid-point circle c)}) 
     circle-list)
   (filter #(< (:overlap-amount %) 0))))

That’s easy enough to read. We are taking a circle, then calculating the amount of overlap with every circle in the list. Finally, we filter it to return only those items where the circles do overlap.

Hmm. Filtering. That means we’re throwing away stuff.

If you look at the function I’m mapping, it always calculate the midpoint between both circles being evaluated, and returns a hash-map with the information even when it’s going to be discarded later. That’s wasteful.

Waste is bad.

(defn calculate-circle-overlaps
  [circle circle-list]
  (->>
    (map #(let [distance (circle-overlap-distance circle %)]
           (if (< distance 0)
             {:overlap-amount distance :mid-point (mid-point circle %)}
             nil))
         circle-list)
    (remove nil?)))

This new version should be much leaner, as it returns the values for the overlap amount and midpoints only when we’ll actually use them, returning nil otherwise.

Running it, for 200 agents I now get 19 frames per second. Woot! That’s a 50% performance improvement from the last one, just by not creating hash-maps for values we were going to discard anyway.

And speaking of waste…

On my original review, I had commented that using datatypes for the circle information improved performance as much as 300%. I’m still using a hash-map for the points, which means I could be pushing that up further with a trivial change.

Simply defining

1	(deftype Point [x y])

and using that everywhere instead of the hash-map for mid-point gave us yet another performance boost, this time from 19 to 22 frames per second. That’s almost the same framerate we were getting from the CoffeeScript version.

Interestingly, changing the entire overlap item to a datatype did not yield any significant performance improvements.

flatten

Now, if you look at the calculate-overlaps function above, you’ll see I’m conjoining the result of every overlap calculation into an accumulator.

Given that calculate-circle-overlaps returns a collection, that means we end up with a vector of vectors, which requires us to call flatten once we’re done.

(defn update-with-midpoints [state]
  (let [circles (:circles state)]
    (assoc state :circles (map update-circle-position circles)
                 :midpoints (->> (calculate-overlaps circles)
                                 (flatten)))))

Looking at the documentation for it, turns out that the lazy version of flatten is slow, and if you don’t need lazyness there is an eager version proposed with much better performance. It does use concat, which Stuart Sierra advised against, but for our data volume and test purposes it’ll do.

(defn eager-flatten [l]
  (loop [l1 l, l2 `()]
    (cond
      (sequential? (first l1)) (recur (concat (first l1) (rest l1)) l2)
      (empty? l1) (reverse l2)
      :else (recur (rest l1) (cons (first l1) l2)))))

I replace the lazy one with the eager version… and now we have 27 frames per second for 200 circles.

Woha. With 200 agents, we are now actually faster than the CoffeeScript version, without any significant changes to the sketch’s structure.

All together now

Let’s run this with the same amount of agents we did before, and see what sort of performance improvements those few changes bought us.

Omitting the old, sad ClojureScript results:

Circles	CoffeeScript	CLJS 2015-12	With tweaks
100	38	30	30
150	34	27	30
170	32	16	30
200	25	12	27
350	8	3	7

Well, that’s nice, but oddly consistent… And guess what? I had forgotten I was capping Quil’s framerate at 30 fps, since that wasn’t even relevant before. Increasing it to 60 yields:

Circles	CoffeeScript	CLJS 2015-12	With tweaks
100	38	30	60
150	34	27	52
170	32	16	40
200	25	12	27
350	8	3	7

We have given the old CoffeeScript example a thorough trashing without even aiming for that, while remaining immutable and fully functional!

Conclusion

Bear in mind that I did only minor changes - I could have gone much further. For starters, I don’t actually need to flatten the sequence, since we could just as well assume we’re getting a list of lists when we draw the mid points. I can also save the reverse on eager-flatten, since I don’t care about the list being on any particular order; look into the application keeping its own state mutable instead of using functional mode; or use reducers for the data transformations.

The goal, however, was to retain the shape of the original code as much as possible - to have a small affected surface. And as you have seen, I accomplished that while vaulting over the mutable version.

That’s an amazing improvement, and it’s caused me to revise my initial assessment. We can use ClojureScript efficiently and with ease in contexts where its immutability still means we’re discarding a large amount of data every frame.

Can’t wait to try it on an small development platform.

Corollary

Over at reddit /u/ryalla pointed out that I forgot about keep, and

(defn calculate-circle-overlaps
  [circle circle-list]
  (->>
    (map #(let [distance (circle-overlap-distance circle %)]
           (if (< distance 0)
             {:overlap-amount distance :mid-point (mid-point circle %)}
             nil))
         circle-list)
    (remove nil?)))

can instead be written as

(defn calculate-circle-overlaps
  [circle circle-list]
  (keep #(let [distance (circle-overlap-distance circle %)]
           (when (< distance 0)
             {:overlap-amount distance :mid-point (mid-point circle %)}))
         circle-list))

This saves us having to iterate over it twice, and provides another bump on FPS, sending it for 200 agents from 27fps to 46fps.