At the recent Thingmonk conference in London, I got to chatting about Clojure and ClojureScript. It’s an IoT event, so it was to be expected that the topic of performance on devices like the Tessel or Raspberry Pi would come up.
That got me thinking about the piece I wrote back in January about ClojureScript performance for Processing sketches. How much has ClojureScript performance improved since?
Let’s find out.
Here’s a summary of the original article.
I was working with Processing sketches in ClojureScript, and decided to see how far I could push a sketch that processed a lot of agents per frame.
The sketch initializes a number of circle agents with a random radius and movement direction, and then every frame we:
The first thing to check was how much ClojureScript performance had improved by itself, without my tweaking the code at all. For ease of reference, these were the performance numbers back then (on frames per second for a number of agents):
Circles | CoffeeScript | ClojureScript |
---|---|---|
100 | 38 | 27 |
150 | 34 | 6 |
170 | 32 | N/A |
200 | 25 | N/A |
350 | 8 | N/A |
The N/A on the ClojureScript column is because at that agent volume, the CLJS version became a very slow slide show.
To test current performance, I upgraded to ClojureScript 1.7.170 but left Quil at 2.2.4, which is the version that was available back in January. That allows me to control for any improvements Quil may have had, even though profiling had not shown it to be the bottleneck - garbage collection and iterations were.
The tests were a pleasant surprise: I could now run it effectively all the way up to 350 circles, even if it was still slower than the CoffeeScript version across the board.
Circles | CoffeeScript | CLJS 2015-01 | CLJS 2015-12 |
---|---|---|---|
100 | 38 | 27 | 30 |
150 | 34 | 6 | 27 |
170 | 32 | N/A | 16 |
200 | 25 | N/A | 12 |
350 | 8 | N/A | 3 |
While slower, these numbers shown a marked improvement - for starters, we can still run it at higher agent volume! Now that things are getting closer to a fair fight, how much could we optimize it, and how much effort would it require?
First, let’s do a quick refresher on the handicaps that ClojureScript is facing on this bout:
For the record, I am not trying to go pedal-to-the-metal here. In fact, I am trying to improve performance while touching as little code as possible. My aim is to have the program retain its original shape.
Let’s look at the low-hanging fruit first. calculate-overlaps
iterates over the entire circle list, calculating the overlap for each circle with the remaining ones, and returns a list of the overlap collections.
1 | (defn calculate-overlaps [circle-list] |
As we can see it’s doing this with a regular vector, which we can easily turn into a transient one and persist it once we’re done.
1 | (defn calculate-overlaps [circle-list] |
Done! Testing with 200 circles, this has given us a performance bump of…. drum roll… 1 frame per second.
We just went from 12 to 13 frames per second.
Clearly that’s not where our GC hit is coming from, which makes sense seeing as how we’re “only” recurring 200 times on that case.
On to the next suspect.
We then have calculate-circle-overlaps
, which calculate-overlaps
above calls for each circle. In its original version:
1 | (defn calculate-circle-overlaps |
That’s easy enough to read. We are taking a circle, then calculating the amount of overlap with every circle in the list. Finally, we filter it to return only those items where the circles do overlap.
Hmm. Filtering. That means we’re throwing away stuff.
If you look at the function I’m mapping, it always calculate the midpoint between both circles being evaluated, and returns a hash-map with the information even when it’s going to be discarded later. That’s wasteful.
Waste is bad.
1 | (defn calculate-circle-overlaps |
This new version should be much leaner, as it returns the values for the overlap amount and midpoints only when we’ll actually use them, returning nil
otherwise.
Running it, for 200 agents I now get 19 frames per second. Woot! That’s a 50% performance improvement from the last one, just by not creating hash-maps for values we were going to discard anyway.
On my original review, I had commented that using datatypes for the circle information improved performance as much as 300%. I’m still using a hash-map for the points, which means I could be pushing that up further with a trivial change.
Simply defining
1 | (deftype Point [x y]) |
and using that everywhere instead of the hash-map for mid-point
gave us yet another performance boost, this time from 19 to 22 frames per second. That’s almost the same framerate we were getting from the CoffeeScript version.
Interestingly, changing the entire overlap item to a datatype did not yield any significant performance improvements.
Now, if you look at the calculate-overlaps
function above, you’ll see I’m conjoining the result of every overlap calculation into an accumulator.
Given that calculate-circle-overlaps
returns a collection, that means we end up with a vector of vectors, which requires us to call flatten
once we’re done.
1 | (defn update-with-midpoints [state] |
Looking at the documentation for it, turns out that the lazy version of flatten
is slow, and if you don’t need lazyness there is an eager version proposed with much better performance. It does use concat
, which Stuart Sierra advised against, but for our data volume and test purposes it’ll do.
1 | (defn eager-flatten [l] |
I replace the lazy one with the eager version… and now we have 27 frames per second for 200 circles.
Woha. With 200 agents, we are now actually faster than the CoffeeScript version, without any significant changes to the sketch’s structure.
Let’s run this with the same amount of agents we did before, and see what sort of performance improvements those few changes bought us.
Omitting the old, sad ClojureScript results:
Circles | CoffeeScript | CLJS 2015-12 | With tweaks |
---|---|---|---|
100 | 38 | 30 | 30 |
150 | 34 | 27 | 30 |
170 | 32 | 16 | 30 |
200 | 25 | 12 | 27 |
350 | 8 | 3 | 7 |
Well, that’s nice, but oddly consistent… And guess what? I had forgotten I was capping Quil’s framerate at 30 fps, since that wasn’t even relevant before. Increasing it to 60 yields:
Circles | CoffeeScript | CLJS 2015-12 | With tweaks |
---|---|---|---|
100 | 38 | 30 | 60 |
150 | 34 | 27 | 52 |
170 | 32 | 16 | 40 |
200 | 25 | 12 | 27 |
350 | 8 | 3 | 7 |
We have given the old CoffeeScript example a thorough trashing without even aiming for that, while remaining immutable and fully functional!
Bear in mind that I did only minor changes - I could have gone much further. For starters, I don’t actually need to flatten the sequence, since we could just as well assume we’re getting a list of lists when we draw the mid points. I can also save the reverse
on eager-flatten
, since I don’t care about the list being on any particular order; look into the application keeping its own state mutable instead of using functional mode; or use reducers for the data transformations.
The goal, however, was to retain the shape of the original code as much as possible - to have a small affected surface. And as you have seen, I accomplished that while vaulting over the mutable version.
That’s an amazing improvement, and it’s caused me to revise my initial assessment. We can use ClojureScript efficiently and with ease in contexts where its immutability still means we’re discarding a large amount of data every frame.
Can’t wait to try it on an small development platform.
Over at reddit /u/ryalla pointed out that I forgot about keep
, and
1 | (defn calculate-circle-overlaps |
can instead be written as
1 | (defn calculate-circle-overlaps |
This saves us having to iterate over it twice, and provides another bump on FPS, sending it for 200 agents from 27fps to 46fps.
Published: 2015-12-14